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Abstracts 


PI 151705543, 

Author="Tieling Chen", 

Title-'Detail Preserving Sorted Difference Filter", 

Abstract: A detail preserving filter that uses the intensity differences between a pixel 
and its neighbor pixels to eliminate impulse noises with minor variations is proposed. 
The absolute values of the intensity differences are sorted into a sequence in 
ascending order and the value at a specific position is used to determine whether the 
pixel under processing is an impulse noise or not. The method to find the specific 
position is provided and its feasibility is discussed. Theoretically impulse noises can 
be correctly selected when the density of the noises are not heavy. The filter preserves 
more fine details than the standard median filter in general. In the situation when the 
distribution of impulse noises has minor variations, the filter works better than the 
commonly used adaptive median filter and also produces cleaner results. 

PI 151738585, 

author= "Vladimir A. Kulyukin", 

title-'GreedyHaarSpiker: An Algorithm for In Situ Detection of Highway Lane 
Boundaries with ID Haar Wavelet Spikes", 

Abstract: An algorithm is presented for in situ vision-based detection of highway 
lane boundaries on a raspberry pi computer coupled to a raspberry pi camera. The 
raspberry pi unit is placed inside a Jeep Wrangler, next to the windshield, and is 
powered through a 12V-to-5V car charger. The algorithm, called GreedyHaarSpiker, 
is based on the detection of ID Haar Wavelet spikes in ID Ordered Haar Wavelet 
Transforms of image rows. To obtain experimental video data for daytime driving, the 
author drove his Jeep with the installed raspberry pi unit on a sunny day in September 
2016 (run 1) and a cloudy day with light rain in November 2016 (run 2) at a speed of 
55-60 miles per hour on Route 30, a two-lane Northern Utah highway. To obtain data 
video data of driving on snowy roads and night driving, the author drove his Jeep on 
the same highway and at the same speed on a day after a heavy snowfall (run 3) in 
January 2017 and on the same day after sunset (run 4). Each run was approximately 
35 miles long. Each video was partitioned into frames and a sample of 360 x 240 PNG 
consecutive frames was selected from each captured video. The performance of the 
algorithm was tested in situ on a raspberry pi 3 model B ARMv8 1GB RAM computer 
on each of the four frame samples. The algorithm is implemented in Python 2.7.9 with 
OpenCY 3.0. The current implementation processes 20 frames per second. 



PI 151714561, 

author="Poorva Waingankar and Sangeeta Joshi", 

title-'Video Compression using Efficient Encoding Techniques for Low Bit Rate 
Applications", 

Abstract: This paper presents use of Accordion technique along with modified Run 
Length Encoding for video compression, which consists of exploiting the high amount 
of temporal redundancies present in videos by converting them to spatial redundancy 
and using 2D DCT. The Video compression steps are either optimized or completely 
revamped to meet the compression and video quality requirement in mobile 
application. This technique is less complex to suit lower end CPUs and achieves a 
very good compression ratio to suit the narrow bandwidth environments of wireless 
networks, without compromising on the quality of the video. 


PI 151733581, 

author="Ayush Purohit and Shardul Singh Chauhan", 
title="A Precise Technique for Hand Gesture Recognition", 


Abstract: Vision based methodologies provides a more natural and proficient result 
when contrasted with traditional strategies which have been utilized for hand gesture 
recognition. In this paper, we proposed a video based hand gesture recognition. Our 
approach commences by acquiring the video frame from a source and converting it 
into 2D binary frame using YCbCr color space. We implemented opening and closing 
operations to filter the noise from the frame. In order to track and segment the hand 
gesture we used Kalman filter and convex hull along with convexity defects for 
detecting hand regions from the frame. Our framework can perceive six kinds of hand 
gestures at present time. 

PI 151748594, 

author="D. Boopathy and M. Sundaresan", 

title-'Securing Images on Cloud using Multidimensional Approach", 

Abstract: Encryption is one of the methodologies used to maintain and protect the 
data confidentiality. As per the user data type’s requirements, users need to adopt and 
implement any one of the existing methods. But those encryption methods and 
standards may not be bound within the user data country regulations, when the users 
are from different geographical locations. Some of the existing methods are already 
compromised by hackers and also some of the government agencies are forcing their 
country based service providers to provide the encrypted information in the name to 
maintain the country’s security. It is very difficult to manage the threats with one 
method. The proposed method tried its maximum level to reduce the threats by using 
different points of view. In this proposed method images and the block-based 
encryption method have been used to protect the normal and sensitive image from the 
unauthorized access. The proposed method is tested on all proposed encryption types 
using greyscale in two scenarios. They are Different Images One Type (DIOT) and 
Single Image All Types (SIAT). The results of the proposed methods are evaluated 
using PSNR, MSE, Size of the Image and Histogram to verify the image’s integrity. 



PI 151747592, 

author="M. Anyayahan and M. Balinas and A. La Madrid and M. Laurel and C. 
Lopez and R. Tolentino", 

title-'Rotational invariant Real Time Text Recognition", 

Abstract: In everyday life, people always encounter different text images. These text 
images are in a style of linear or multi-oriented texts in either printed or written form. 
Due to different orientations of texts in an image, it is a challenge in Optical Character 
Recognition to recognize this kind of text. In this paper, real time recognition of text 
in different rotational variations is presented. The performance is done from 
acquisition of image by a camera and processed by Microsoft Visual Studio. The 
detection and recognition of text with different rotational variations are achieved by 
detecting and computing the direction and angle of tilt respectively through the use of 
geometric and trigonometric principles then recognized by Tesseract optical character 
recognition engine after counter rotation. 
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Detail Preserving Sorted Difference Filter 

Tieling Chen 

Department of Mathematical Sciences, University of South Carolina Aiken 
471 University Parkway, Aiken, SC 29803, USA 
tielingc@usca.edu, 


Abstract 

A detail preserving filter that uses the intensity dif¬ 
ferences between a pixel and its neighbor pixels to 
eliminate impulse noises with minor variations is pro¬ 
posed. The absolute values of the intensity differ¬ 
ences are sorted into a sequence in ascending order 
and the value at a specific position is used to deter¬ 
mine whether the pixel under processing is an impulse 
noise or not. The method to find the specific position 
is provided and its feasibility is discussed. Theoret¬ 
ically impulse noises can be correctly selected when 
the density of the noises are not heavy. The filter 
preserves more fine details than the standard median 
filter in general. In the situation when the distribution 
of impulse noises has minor variations, the filter works 
better than the commonly used adaptive median filter 
and also produces cleaner results. 

Keywords: Sorted difference filter, median filter, 
adaptive median filter, impulse noises, detail preserv¬ 
ing filter. 


1 Introduction 

The standard median filter replaces the intensity value 
of a pixel being processed with the median value of 
the intensities in its neighborhood. The filter works 
very well on eliminating impulse noises, such as salt- 
and-pepper noises, when the density of the noises is 
not heavy ([1], [3], [11]). Impulse noises usually have 
a unipolar or bipolar distribution of intensities at one 
end or the two ends of the intensity range. In an image 
corrupted by impulse noises, white dots appearing in 
dark regions are called salts and dark dots appearing 
in bright regions are called peppers. There are quite 
a few impulse noise models adopted in research ([5]), 
among which the one with the following probability 
density function is widely used. 

{ Pi, pepper; z = 0, 

P 2 , salt; z = L - 1, (1) 

1 — Pi — P 2 , noise free, 


where the range of intensities is the interval [0, L — 
1], and Pi and P 2 are the corresponding probabilities 
for peppers and salts, usually called densities. When 
Pi = P 2 , the noise is called a salt-and-pepper noise, 
and usually P = Pi + P 2 refers to its total density. 
Because the intensities of peppers and salts are on 
the two ends of the intensity range of a corrupted 
image, they can be removed by the standard median 
filter effectively when P is low. Experimentally, when 
P < 20%, the performance of the standard median 
filter is perceptually satisfactory. 

In fact, the standard median filter still works well 
even though the noises are not perfect impulses, which 
display minor variations around the impulses in the 
distributions. The probability density function of 
these type of noises can be expressed as 

{ Pi(z), pepper; z G [0,ei], 

p 2 (z), salt; z e [L - 1 - e 2 , L - 1], (2) 

P3(z), noise free, 

where the small tolerances e\ and e 2 give two narrow 
intervals at the two ends of the intensity range [0, L — 
1], in which noises occur with probabilities described 
by the functions pi(z) and p 2 {z ), and ps(z) = 1 — 
Pi(z) — P 2 (z) is the probability that a pixel with the 
intensity z is noise free. Denote 

r e i pL —1 

P= / Pi(z)dz + / p 2 (z)dz 

Jo J L — 1— £2 

the total density of the impulse noise with minor vari¬ 
ations. The shapes of the functions pi(z) and p 2 (z) 
are not the main concern of the performance of the 
standard median filter if e\ and e 2 are relatively small. 

However, the standard median filter is not detail 
preserving, with disadvantages including signal weak¬ 
ening and non-noisy image pixel corruption ([6]). This 
is because the intensity of a pixel usually is not exactly 
the median value in its neighborhood covered by the 
filter mask and it is altered during the process. More 
specifically, tiny components of objects in an image 
are usually eroded by the filter. For example, Figure 


1 
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1 shows an image corrupted by salt-and-pepper noises 
with a total density P = 10 and the result processed 
by the standard median filter with size 5 by 5. In 
the resultant image, although the noises are removed, 
the fine details are heavily eroded and blurred by the 
filter. 



(a) (b) 


Figure 1: (a) A test image corrupted by salt-and- 
pepper noises with density 10%. (b) The result by 
the median filter with size 5 by 5. 

Adaptive median filters are more efficient in detail 
preserving than the standard median filter on images 
with perfect impulse noises ([2], [3], [4], and [9]). With 
these filters each pixel is examined and classified as ei¬ 
ther a noise or a noise free pixel. A designated noise 
is then processed with the standard median filter or a 
modified one. For examples, a truncated median filter 
is used to process designated noises in ([7]); a decision 
based filter using multiple thresholds with multiple 
neighborhood information of the center pixel in the 
filter window is proposed to restore images corrupted 
by salt and pepper impulse noise ([8]); with modified 
median filters, decision making filtering technique can 
be used together with adaptive filters to improve effi¬ 
ciency ([10]). 

The essence of the design of an adaptive median 
filter is on improving the accuracy of noise identifica¬ 
tion. We use the adaptive median filter in [3] as an 
example to explain the concept. The filter changes 
the size of its mask to find a proper median intensity 
value in a neighborhood of a pixel. If in the same 
neighborhood the intensity value of the pixel being 
processed is not an extreme value, its intensity value 
is not changed, and otherwise it is replaced with the 
median intensity value. The method tries not to re¬ 
place the intensity value of a pixel unless it has to do 
so, in which case either the pixel has an extreme value, 
which is a candidate of an impulse noise, or the filter 
reaches its maximum size and the standard median 
filter must be used. 

Generally, adaptive median filters work with the 
assumption that noises are impulses without varia¬ 
tions, usually taking the two extreme intensity values 
0 and L — 1 in the intensity range [0, L — 1]. Unfor¬ 
tunately, the perfect pattern does not always present 
in applications. In many situations the distribution 


of the noise intensities shows minor variations around 
the impulses, which frequently occur when the images 
are stored with compression, such as in JPEG format. 
In these situations adaptive median filters may not 
remove all the noises. 

For example, Figure 2 shows an image corrupted 
by salt-and-pepper noises with minor variations on the 
impulses and the result obtained by the Adaptive me¬ 
dian filter with maximum size 5x5. The test image 
was chopped from a test image used in [3], originally 
saved in TIFF format but was converted to JPEG for¬ 
mat. In the TIFF format, the intensity distribution of 
the image displays a perfect pattern of two impulses 
at the two ends, while in the JPEG format, it shows 
two bumps at the two ends of the distribution, imply¬ 
ing the noises are not perfect impulses. In the JPEG 
image the noises cannot be completely removed by the 
adaptive median filter. 



(a) (b) 



0 255 0 255 


(c) (d) 

Figure 2: (a) An image corrupted by salt-and-pepper 
noises, (b) The result of the adaptive median filter 
with maximum size of 5 x 5 on the image in JPEG 
format. Some noises still remain in the image, (c) The 
intensity distribution of the image in TIFF format, 
(d) The intensity distribution of the image in JPEG 
format. 


Our motivation is to design a filter that efficiently 
removes impulse noises with minor variations while 
keeping the non-noisy pixels unaltered. When impulse 
noises have minor variations, adaptive median filters 
can not remove the noises cleanly and the standard 
median filter ruins the fine details. The proposed fil¬ 
ter introduced in the next section classifies noises and 
non-noisy pixels with high accuracy even though the 
impulse noises have minor variations and therefore de¬ 
tails can be well preserved. 

The remainder of the paper is organized as fol¬ 
lows: Section (2) introduces the new filter and ana¬ 
lyze the mathematical mechanism behind it. Section 
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(3) demonstrates the effectiveness of the new filter by 
comparing with the standard median filter and the 
adaptive median filter used in [3]. Section (4) is the 
summary and conclusions. 


2 Sorted difference filter 

For a pixel (x, y ) in a given image, denote 

I(x,y) the intensity at the pixel (x,y), and 
S xy a neighborhood centered at (x,y). 

Suppose S xy is encompassed by a filter centered at 
(x, y) with a size mxn. For every pixel (V, y') G S xy 
we compute the absolute value of the intensity dif¬ 
ference between the pixel (x',y f ) and the center pixel 

(*^5 y ) •> 

dx’y’ = I I(x,y) -I(x',y')\, 

and call it the absolute difference for the pixel (V, y'). 
Denote D xy the sorted sequence of all the absolute 
differences found in S xy in ascending order, 

D xy = the sorted sequence of {d x ' y ' |(a/, y') G S xy }. 

Let l be the length of the sorted sequence D xy: then 
l = mn. We use an index i to access a particular value 
in D xy , then 


D xy — {D xy [i]} i=0 . 

It is easy to see that D xy [ 0] = 0 and D xy [i\ is non¬ 
decreasing in i. 

We use a threshold T on a specific value D xy [i*], 
0 < i* < /, to select noise candidates. The threshold 
T is an experimental value that can be adjusted in 
applications, while the index i* is mainly determined 
by the size of the filter and the density of the impulse 
noises in the image. 

The threshold T is selected in such a way that 
noises with absolute differences higher than T can be 
selected out. To determine i*, suppose the densities 
of salts and peppers are both p, then empirically there 
are about pi salts and pi peppers in S xy . Let i* be 
the nearest whole number no less than pf giving by 
the ceiling function 

i* = ceiling(pl). (3) 

The index i* is the property of the filter and it 
applies to all the pixels under processing. Once i* is 
determined, it does not change during the processing. 

If (x, y) is a noise, say a salt, because we ex¬ 
pect there are i* salts in S xy , then for 0 < i < i*, 
D xy [i\ = 0 if the impulse distribution has no varia¬ 
tions, or D xy [i] < T for a proper threshold T if the 
impulse distribution has minor variations. The value 
D xy [i*] is the first absolute difference between the salt 
noise and the background with an abrupt increment. 


If D xy [i*] > T then the salt noise can be singled out. 
The situation is the same when (x, y) is a pepper. 

For every pixel (x,p), let n xy be the number of 
pixels in S xy such that the absolute differences are 
less than the threshold T. Obviously D xy [i\ < T for 
0 < i < n xy and D xy [n xy \ > T. From the above 
analysis, if (x, y) is a noise, then statistically n xy < i*. 
If (x,y) is a non-noisy pixel and n xy > i* then the 
pixel can be easily differentiated from a noise. 

The number n xy changes with the pixel location. 
The filter is based on that the following condition can 
be reached for a general non-noisy pixel: 

n xy > i*. (4) 

We will see that the condition is generally satisfied 
if the noise density is not too heavy. For every pixel 
(x, y) find the sequence D xy in its neighborhood S xy 
encompassed by the filter. If D xy [i*] > T then (x,y) 
is regarded as a noise and its value is replaced by the 
median intensity value in S xy ; otherwise, the pixel is 
treated as a non-noise pixel and its value is unchanged. 



The objective of the method is to apply the stan¬ 
dard median filter only for the impulse noises and 
keep the non-noisy pixels unaltered. The effect of the 
method is determined by the correctness of classifying 
noises and non-noisy pixels. 

For a non-noisy pixel (x,v), there are a few fac¬ 
tors affecting n xy . If the absolute differences of all 
non-noisy pixels in S xy are less than T and the abso¬ 
lute differences of all noises in S xy are not less than 
T, which usually occurs when (x, y) is in a region 
with slow intensity changes, then n xy is approximately 
equal to l — 2 pi because there are about 2 pi noises in 
S xy • We use the closest whole number that is no more 
than l — 2 pi for n xy in this case, 

n xy = £oor(l — 2 pi). (5) 

For condition (4) to be satisfied, there should be 

l — 2pl > pi , (6) 

which yields p < 1/3. This means when the impulse 
noise density is not heavy, a non-noisy pixel can be 
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correctly identified in general. The filter is equivalent 
to the standard median filter if the densities of both 
salts and peppers are not less than 1/3, or the overall 
density of the impulses is not less than 2/3. 

Notice that the non-noisy pixel {x, y) could be on 
an edge in the image. If {x, y) is on the boundary be¬ 
tween two regions with certain sizes and the filter size 
is relatively small so the boundary is approximately 
straight, then generally there are at least half of the 
non-noisy pixels in the same region of {x, y). Suppose 
the fraction of such pixels in the same region is q and 
the intensity difference between the two regions is not 
less than T, then n xy = (l — 2pl)q approximately. We 
use 

n xy = Boor ((l - 2 pq)q) . (7) 

To keep the edge sharp, the pixel (x, y) should not 
be altered. This requires that the inequality (4) holds, 
which is 

(/ - 2 pl)q > pi, (8) 

giving 

p<q/{l + 2q). (9) 

Using the empirical value q « 0.5, we get p < 0.25. 
As mentioned in [3], the standard median filter per¬ 
forms well when p < 0.2. Under the same condition, 
the new filter works better on edge preserving than 
the standard median filter because edge pixels are not 
altered. 

Theoretically, if the threshold T is properly se¬ 
lected and the densities of the salts and peppers are 
not heavy, for example if they are less than 0.25, the 
noises and the non-noisy pixels can be recognized cor¬ 
rectly by the new filter. The filter checks D xy [i*] > T 
to determine whether a standard median filter should 
be used at (x,y). The threshold T can be adjusted 
in processing to get an optimal result. When T = 0 
then D xy [i*] > T is always true so the new filter is 
exactly the same as the standard median filter. When 
T is large enough such that D xy [i*] > T can never be 
satisfied then the new filter does not change anything 
of the image. 

Practically, even though the density p is known 
beforehand the index z* may not work well to remove 
all the noises. This is because digital images are dis¬ 
cretely represented and noises are not perfectly evenly 
distributed, and therefore noises may form clusters 
with sizes larger than z*. If a noise (x,y) is in such a 
cluster and there are more than z* pixels of the clus¬ 
ter in S xy , then D xy [i*] is very small and it cannot 
be detected by the threshold. To remove noise clus¬ 
ters the threshold should be applied to D xy \j\ with 
a bigger index j > z*. Such an index j must be no 
bigger than n xy , otherwise non-noisy pixels cannot be 
correctly recognized. 

Generally, when the noise density is not heavy 
there is a big gap between z* and n xy for every non- 
noisy pixel (x,y). This means a proper index j can 


be easily selected such that z* < j < n xy . For exam¬ 
ple, when p = 0.1 and the size of the filter is 5 x 5, 
which gives l = 25, then by (3) z* = 3. To compute 
n xy , we choose (7) with q = 0.5 because the resultant 
value is smaller than the value given by (5), and get 
n xy = (Z — 2 pl)q = 10. Then j can be selected in the 
range 3 < j < 10. The gap between z* and n xy pro¬ 
vides flexibility for the filter to find a proper index j 
in applications without knowing the actual density of 
the impulses. Notice that a big j also remove some 
tiny components of objects in an image. 

In summary, for a given image corrupted by im¬ 
pulses with minor variation, select a filter size l = 
mxn and set up initial values of the index j and the 
threshold T. If the impulse density p can be obtained 
then j can be initially chosen from Ip < j < (Z — 2 pl)q 
with q = 0.5; if not, j can still be easily selected be¬ 
cause of the big gap between z* and n xy . For every 
pixel {x, y ) under processing, sort the absolute differ¬ 
ences of pixels in the neighborhood encompassed by 
the filter in ascending order to get the sequence D xy . 
If D xy [j] > T, then the standard median filter with 
the same filter size is applied; otherwise, the intensity 
value at (x,y) is unchanged. Finish processing the 
image to get a result. Adjust the threshold T with 
each j, and also adjust j and the size of the filter as 
needed until an optimal result is obtained. 

3 Experimental results 

In our experiments, the new filter is compared with 
the standard median filter and the adaptive median 
filter in ([3]). 

The top two images in Figure 3 (a) and (b) display 
the results of the new filter and the adaptive median 
filter on the test image in Figure 1 (a), respectively. 
The image in Figure 3 (a) is obtained by the new 
filter with a size 5 x 5, j = 8, and T = 40. Com¬ 
pared with the image in Figure 1 (b) obtained by the 
standard median filter, the new result is much per¬ 
ceptually better because it keeps more details of the 
watch. The Peak Signal to Noise Rate (PSNR) of the 
image in Figure 3 (a) is 30.31, which is apparently 
higher than the PSNR of the image in Figure 1 (b), 
27.76. Notice that RSNR is only a coarse estimate 
of the effectiveness of a denoising method, and the 
perceptual effect of the method is also an important 
consideration. The image in Figure 3 (b) is the re¬ 
sult obtained by the adaptive median filter. Because 
the impulse noises have minor variations some noises 
are not cleanly removed. The PSNR of this resultant 
image is 23.86 owing to the remaining peppers and 
salts. 

At the bottom of Figure 3, results with two differ¬ 
ent methods on the test image in Figure (2) (a) are 
displayed. With the new filter, by setting the filter 
size 5 x 5, j = 10 and threshold T = 22, we get the re¬ 
sult shown in Figure 3 (c). For comparison, the result 
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obtained by the standard median filter with the same 
size is also displayed in Figure 3 (d). Perceptually, 
the fine details are kept very well in the image ob¬ 
tained by the new filter but they are severely blurred 
by the standard median filter. The PSNRs are 25.31 
for image in Figure 3 (c) and 24.74 for the image in 
Figure 3 (d), with the new filter giving the higher one. 
Because the impulse noises have variations on their in¬ 
tensities, some noises cannot be removed by the adap¬ 
tive median filter, shown in Figure 2 (b), which is not 
acceptable because of the remaining noises. 
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Figure 3: (a) The result obtained by the new filter on 
the test image in Figure 1. (b) The result obtained by 
the adaptive median filter, (c) The result of the new 
filter on the test image in Figure 2 (a), (d) The result 
by the standard median filter with size 5x5. 

Next, we display the comparisons on the com¬ 
monly used test images Barbara, Boat, Cameraman 
and Lena. For each test image, we added impul¬ 
sive salt and pepper noises by 10%, 20%, 30% and 
40%, respectively, and then converted the noisy im¬ 
ages into JPEG format. Then the adaptive median 
filter (AMF), the standard median filter (MF) with 
size 5x5, and the new filter (NF) with size 5 x 5, a 
proper index j and a threshold T were applied on these 
images. Images with 20% salt and pepper noises and 
the corresponding filtered images by the mentioned 
filters are shown in Figures 5 through 8. All PSNR 
comparisons are summarized in a table at the end. Be¬ 
cause the test images with noises are saved in JPEG 
format, the noise distributions all have minor varia¬ 
tions, shown at the two ends of the intensity range of 
each distribution chart in Figure 4. 

Figure 5 displays the comparisons on the noise cor¬ 


Figure 4: Intensity distributions of the test images 
corrupted with 20% salt and pepper noise, saved in 
JPEG format. Top left for Barbara; Top right for 
Boat; Bottom left for Cameraman; Bottom right for 
Lena. 


rupted image Barbara in JPEG format. The top left 
image is the test image corrupted with 20% salt and 
pepper noise and the top right is the result by NF. 
The bottom left is the result by AF and the bottom 
right is the result by MF. Among the three filtered 
images, the image by NF shows the best perceptual 
effect. 



Figure 5: Top left, the noise corrupted image. Top 
right, the result by the new filter. Bottom left, the 
result by the adaptive median filter. Bottom right, 
the result by the standard median filter. 

Figures 6, 7 and 8 display similar comparisons for 
the noise corrupted test images Boat, Cameraman and 
Lena. 

Among the above images, the results obtained by 
the new filter preserve more fine details than the re¬ 
sults obtained by the standard median filter. The re¬ 
sults obtained by the adaptive median filter still con¬ 
tain noticeable noises so they are not of satisfactory. 
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Figure 6: Top left, the noise corrupted image. Top 
right, the result by the new filter. Bottom left, the 
result by the adaptive median filter. Bottom right, 
the result by the standard median filter. 


Figure 8: Top left, the noise corrupted image. Top 
right, the result by the new filter. Bottom left, the 
result by the adaptive median filter. Bottom right, 
the result by the standard median filter. 



Figure 7: Top left, the noise corrupted image. Top 
right, the result by the new filter. Bottom left, the 
result by the adaptive median filter. Bottom right, 
the result by the standard median filter. 


The PSNRs of the filtered images for the test im¬ 
ages corrupted with different percentages of salt and 
pepper noises are listed in the following table. For 
a test image corrupted with noise less than 40%, the 
PSNR of the result obtained by the new filter is the 
highest. When the noise percentage is higher than 
40%, the effect of the new filter is essentially the same 
as that of the standard median filter. 


fnnage 

Noise 

percentage 

PSNR for different filters 

AMP 

MF 

NF 

Barbara 

10% 

22.75 

23.04 

24.02 

20% 

ia .93 

22.38 

23.02 

30 % 

15.75 

22.65 

22.68 

40% 

13.72 

22.15 

22.15 

Boat 

10% 

23.78 

26.37 

27.86 

20% 

19.35 

26.58 

27.24 

30% 

15.95 

25.90 

26.19 

40% 

13.93 

25.04 

25.10 

Cameraman 

10% 

23.12 

23.50 

24.65 

20% 

13.80 

23.29 

23.71 

30% 

15.38 

22.77 

22.83 

40% 

13.31 

22.11 

22.05 

Lena 

10% 

24.06 

30.33 

32.00 

20% 

19.43 

29.90 

30.57 

30% 

16.02 

29.06 

29.22 

40% 

13.95 

27.75 

27.76 


4 Conclusion 

A novel method is proposed to eliminate impulse 
noises with minor variations. The method uses a fil¬ 
ter to find out impulse noises and replace them with 
the median intensity values in their neighborhoods, 
while the non-noisy pixels are not altered. The pro¬ 
posed filter outperforms the standard median filter 
in fine details preserving because only the noises are 
processed. The new filter is immune to minor varia¬ 
tions of the impulses, so it is more applicable than the 
adaptive median filter that works well on details pre¬ 
serving and noises elimination only when the impulses 
have no variations. Theoretically, when the density of 
the impulses is not heavy the noises can be correctly 
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identified. The parameters used in the method can 

also be easily adjusted in applications to obtain opti¬ 
mal results. 
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Abstract 

An algorithm is presented for in situ vision-based de¬ 
tection of highway lane boundaries on a raspberry pi 
computer with a raspberry pi camera. The raspberry 
pi unit is placed inside a Jeep Wrangler, next to the 
windshield, and is powered through a 12V-to-5V car 
charger. The algorithm, called GreedyHaarSpiker , is 
based on the detection of ID Haar Wavelet spikes in 
ID Ordered Haar Wavelet Transforms of image rows. 
To obtain experimental video data for daytime driv¬ 
ing, the author drove a 2016 Jeep Wrangler with the 
installed raspberry pi unit on a sunny day in Septem¬ 
ber 2016 (run 1) and a cloudy day with light rain in 
November 2016 (run 2) at a speed of 55-60 miles per 
hour on Route 30, a two-lane Northern Utah highway. 
To obtain video data of snowy roads and night driv¬ 
ing, the author drove the same vehicle on the same 
highway and at the same speed on a day after a heavy 
snowfall (run 3) in January 2017 and on the same 
day after sunset (run 4). Each run was approximately 
35 miles long. Each video was segmented into 360 
x 240 PNG frames. A sample of 1,000 consecutive 
frames was selected from each video. The perfor¬ 
mance of the algorithm was tested on a raspberry pi 3 
model B ARMv8 1GB RAM computer on each of the 
four frame samples. The algorithm is implemented in 
Python 2.7.9 with OpenCV 3.0. The current imple¬ 
mentation processes 20 frames per second. 

Keywords: Computer Vision, Wavelets, Lane 
Detection, Autonomous Vehicles 

Nomenclature: CV - Computer Vision, AV - 
Autonomous Vehicle, HWT - Haar Wavelet Trans¬ 
form 

1 Introduction 

Autonomous vehicles (AVs), i.e., vehicles capable of 
navigating various environments without human in¬ 
put, have featured prominently in many research and 
commercial projects for several decades. The CMU 


Navigation Laboratory (Navlab) has built a series of 
robot cars, SUVs, and buses since 1984. The lat¬ 
est robotic car, Navlab 11, is a robot Jeep Wrangler 
equipped with a range of sensors for obstacle avoid¬ 
ance, path planning and following, and pedestrian de¬ 
tection [1]. The European Technology Platform on 
Smart Systems Integration project has reported signif¬ 
icant contributions to collision avoidance, fleet man¬ 
agement, autonomous cruise control, and cooperative 
driving [2]. Over the past several years, both Google 
and Tesla have been commercializing their self-driving 
platforms [3, 4]. 

Proponents of AVs argue that the major benefits of 
driverless cars include less traffic congestion, enhanced 
mobility for the elderly and the disabled, significant 
increases in roadway capacity, and reduction in traffic 
accidents [5, 6]. The claim about the reduction in traf¬ 
fic accidents is typically supported by the argument 
that since ah driverless cars will use the same algo¬ 
rithms, they will act predictably and in unison with 
respect to each other. 

Opponents of driverless cars argue that the 
widespread adoption of AVs will result not only in 
major losses of driving jobs, but also will likely lead 
to loss of privacy and increased risks of hacking at¬ 
tacks and terrorism [7]. Some researchers argue that 
lack of stress during driving and more productive time 
on the road may create additional incentives to live 
even further from cities, which will increase the car¬ 
bon footprint of motor transportation systems [8]. 

While we believe that completely autonomous cars 
may become a reality in the long term, provided that 
not only technical failures [9, 10] but also social and 
legal implications [11] of AV adoption are properly ad¬ 
dressed, human drivers are, and will remain indispens¬ 
able in the short and medium terms. Consequently, 
it is important to seek solutions that enhance their 
safety. Robust vision-based lane detection is one such 
enhancement. Specifically, vision-based lane detection 
modules may gradually become an integral part of au¬ 
topilots in semi-trucks to improve the drivers’ safety 
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on long, monotonous highway stretches with low or 
no traffic. Such autopilots will be similar to the ones 
already in use in aircraft and ships and will keep the 
human in the loop in that the decision to engage or 
disengage the autopilot will be under the driver’s con¬ 
trol. 

In this article, an algorithm, called GreedyHaar- 
Spiker , is presented for in situ vision-based detection 
of marked highway lane boundaries on a raspberry pi 
computer with a raspberry pi camera board. It is as¬ 
sumed that the lane boundaries are marked with white 
or yellow lines, as is the case on highways in many 
countries. The computer-camera unit is placed inside 
a 2016 Jeep Wrangler, next to the windshield, and is 
powered through a 12V-to-5V car charger. The algo¬ 
rithm is based on the detection of ID Haar Wavelet 
spikes [12] in ID Ordered Haar Wavelet Transforms 
(HWT) of image rows. The algorithm is implemented 
in Python 2.7.9 and OpenCV 3. 

This article is organized as follows. In Section 2, 
related work is reviewed. In Section 3, the concept 
of a ID Haar Wavelet Spike (ID HWS) is formally 
developed. Section 4 describes the proposed algo¬ 
rithm and analyzes its pseudocode. In Section 5, 
the highway experiments are described and analyzed. 
Findings and conclusions are presented in Section 6. 

2 Related Work 


Erickson and Landberg [16] proposed a lane detec¬ 
tion algorithm that uses Hough lines [17] combined 
with a parabolic second degree fitting for curvature 
detection. On the raspberry pi 2 model, the algo¬ 
rithm’s performance was found to be inadequate for 
high speed driving. However, when the object de¬ 
tection is removed from the algorithm, the algorithm 
meets the real time performance requirements on the 
raspberry pi 2 model. 

Mandlik and Deshmukh [18] have developed a lane 
departure detection system that uses the OpenCV li¬ 
brary [19] to detect vehicle lane departures on the 
raspberry pi hardware. The algorithm uses the 
OpenCV implementations of the Canny Edge detector 
[20] and the Hough Transform [17] to detect straight 
and curved lanes. The experiments are conducted on 
a toy vehicle with a USB camera mounted on top of 
it for sending images of white paper lanes on a black 
floor surface to a raspberry pi computer powered by 
a laptop. 

The position advocated in this article is similar to 
the positions advocated in [16] and [18]: to be eco¬ 
nomically viable and broadly shareable, vision-based 
lane detection algorithms must be implemented and 
tested in situ on off-the-shelf low-voltage hardware 
platforms such as the raspberry pi. The creation of 
replicable hardware and software solutions will enable 
citizen science drivers to build, test, and broadly share 
replicable driver safety enhancements. 


Vision-based lane detection has been the focus of 
many research and development (R&D) projects in 
the past two decades. Wang et al. [13] propose 
a B-Snake based lane detection and tracking model 
for a range of lane structures. An algorithm, called 
CHE VP, is developed for providing initial positions 
for the B-Snake model. A minimum error method is 
proposed to determine the control points of the B- 
Snake model by the image forces on both sides of 
a lane. Experimental results suggest that the algo¬ 
rithm is robust against noise, shadows, and illumi¬ 
nation variations in captured images of marked and 
unmarked roads. 

Kim [14] presents a lane detection and tracking 
algorithm to detect lane curvatures, lane changes, 
and splitting lanes. The detected lane markings are 
grouped into separate left and right lane-boundary hy¬ 
potheses to handle merging and splitting lanes. The 
hypotheses are evaluated and grouped with a proba¬ 
bilistic, Markov-style framework. 

Hsiao et al. [15] propose an embedded real-time 
lane departure warning system (LDWS) for daytime 
and nighttime driving. The LDWS features a lane de¬ 
tection algorithm based on peak finding for feature ex¬ 
traction to detect lane boundaries. Gaussian smooth¬ 
ing and global edge detection are applied to reduce 
noise in images. The reported lane detection rates 
were 99.57% during the day and 98.88% at night on a 
sample of highway images. 


3 Haar Wavelet Spikes 

The GreedyHaarSpiker algorithm described in Section 

4 depends on the concept of the ID Haar Wavelet 

Spike developed in this section. In the ID Haar 
Wavelet Transform (ID HWT), a signal is a vector 
in R n ,n = 2 k ,k G N. Following the formalization in 
[21], let be a 2 k x 2 k matrix for computing k 

scales of the ID HWT. This matrix can be effectively 
computed from the n canonical base vectors of R n . 
If x = (xo, ...,x 2 k_ i) is a signal in R n , then y is the 
k-scdle ID HWT of x defined in (1). 


W^x T = y (1) 

The transform of the signal is given in (2), where 
= fi(y) and is the coefficient of the i-th basic 
Haar wavelet at scale j [22]. For example, (3) defines 
the matrix for computing the ID HWT in R 2 . 
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Figure 1: Two types of up-down spikes (above) and 
the corresponding Haar wavelets at a given scale k 
from a signal (below). 

If the input signal x = (0,1,1,0), then (4) gives 
the ID HWT of x computed as x T = y , where 
y T = (0.5,0,-0.5, 0.5). 
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It has been theoretically proven that the HWT 
can detect significant changes in signal values [23]. 
In this article, we claim that some changes can be 
characterized as signal spikes [ 12 ]. Suppose that there 
is a finite ID digital signal. The signal’s values may 
first rise and then fall or they may first fall and then 
rise. The signal’s values may also have a relatively flat 
segment between the rise and the fall or the fall and 
the rise. Of course, the signal’s values may remain 
flat for the entire duration of the signal, but a flat 
signal is not particularly interesting in the sense that 
it indicates that the underlying phenomenon modeled 
by the signal does not change. 

To model the behavior of ID digital signals, four 
types of spikes are postulated: up-down triangle, 
up-down trapezoid, down-up triangle, and down-up 
trapezoid. The difference between up-down and 
down-up spikes is, as their names suggest, the rela¬ 
tive positions of the climb and decline segments. In 
trapezoid spikes, flat segments are always in between 
the climb and decline segments. One can also view 
triangle spikes as trapezoid spikes with zero flat seg¬ 
ments. 

Fig. 1 shows up-down triangle and trapezoid 
spikes. Fig. 2 shows down-up triangle and down-up 
trapezoid spikes. In both figures, the lower graphs 


Figure 2 : Two types of down-up spikes (above) and 
the corresponding Haar wavelets at a chosen scale k 
from a signal (below). 

represent the possible values of the corresponding 
Haar wavelets at a chosen scale k. Up-down spikes 
describe signals that first increase and then, after an 
optional flat segment, decrease. Down-up spikes de¬ 
scribe signals that first decrease and then, after an 
optional flat segment, increase. 

Let S be a spike. Then, formally, a spike is a nine 
element tuple whose elements are real numbers given 
in (5). 

S = (u s ,u e ,a, / s ,/e,7,d s ,d e ,/3) (5) 

The first two elements, u s and u e , are the abscis¬ 
sae of the start and end of the spike’s climb segment 
[u s , u e \, respectively, on which the wavelet coefficients 
of the ID HWT increase. If and are the 
k -th scale wavelet coefficient ordinates at u s and u e , 
respectively, the steepness of the climb, denoted by a , 
is given in ( 6 ). 

a = tan- 1 ( u e - u s , ) ( 6 ) 

As shown in Fig. 1 and Fig. 2 , in (5), the flat seg¬ 
ments of up-down or down-up spikes, are described by 
/ s , / e , and 7 , where f s and f e in (5) are the abscis¬ 
sae of the start and end of the spike’s flat segment, 
respectively, over which the wavelet coefficients either 
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Figure 3: A 7-inch raspberry pi touchscreen display 
where the output of the algorithm is shown. The dis¬ 
play is mounted inside a Jeep Wrangler under the rear 
view mirror in the middle of the windshield. 



Figure 4: A raspberry pi v 2 camera, shown by a red 
arrow, is attached to a small cardboard box. The 
upper edge of the camera is taped to the windshield. 
The raspberry pi computer, shown by a green arrow, 
is attached to the back side of the display shown by a 
blue arrow. 

remain at the same ordinate or have minor ordinate 
fluctuations. 

If and are the k -th scale wavelet coef- 

Js Je 

ficients corresponding to f s and / e , respectively, the 
spike’s flatness, denoted by 7 , is defined in (7). 

7 = tan~ 1 (f e - f s ,ufj (7) 

The numbers of d s and d e in (5) are the absissae of 
the start and end, respectively, of the spike’s decline 
segment [d s , d e \, over which the wavelet coefficients of 
the ID HWT decrease. 

If l and are the fc-th scale wavelet coeffi¬ 
cient ordinates at d s and d e , respectively, the steep¬ 
ness of the decline, denoted by /3, is given in ( 8 ). 

P = tan~'(d e -d a ,d£ ( 8 ) 



Figure 5: The detected lane boundaries are graphi¬ 
cally displayed in the bottom right corner of the dis¬ 
play. The green arrow points to the detected left 
boundary. The red arrow points to the detected right 
lane boundary. 

4 GreedyHaarSpiker: Detection 
of Lane Boundaries 

Figures 3, 4, and 5 show the hardware on which 
GreedyHaarSpiker , the lane boundary detection algo¬ 
rithm described in this article, currently runs. In Fig. 
3, a seven-inch raspberry pi touchscreen monitor is 
shown. As shown in Fig. 4, the monitor is attached 
to a raspberry pi 3 model B ARMv 8 computer with 
1 GB RAM identified by a green arrow. The computer 
is attached to the back of the monitor and coupled to 
a raspberry pi camera board v2. The camera, identi¬ 
fied by a red arrow, is attached to a small cardboard 
box and taped to the windshield for balance. In the 
future, more stable structures will be designed and 
deployed. 

In Fig. 5, the monitor displays the left and right 
lane boundaries as they are being detected by the al¬ 
gorithm in real time as the vehicle is driven. The 
whole system is powered through a 12V-to-5V car 
charger where the USB power line for the raspberry 
pi is plugged. 



Figure 6 : Sample frame from video 1 . 


Algorithm 1 gives the pseudocode of the procedure 
detectLaneBoundaries. The procedure takes as input 
a 360 x 240 PNG image like the one shown in Fig. 6 . 
The ROI in the bottom center of the image, shown as 
a white rectangle in the center of the image in Fig. 7, 
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Figure 7: A region of interest with scanlines and de¬ 
tected lane boundaries. 



Figure 8: The lane boundaries detected in the frame 
in Fig. 6. The green line marks the left boundary; the 
red line marks the right boundary. 


is cropped. The cropped ROI is grayscaled, blurred 
with the 7x7 Gaussian kernel, and thresholded with 
the Otsu thresholding operator. 

The procedure greedyHaarSpiker, outlined in Al¬ 
gorithm 2, is applied to the preprocessed ROI. This 
procedure returns two lists, LPoints and RPoints, of 
(x, y) tuples. The procedure fitLine uses linear regres¬ 
sion to fit a line through LPoints and RPoints. The 
lines are filtered by slope to reduce false positives. 
The slope thresholds for the left boundary are from 
-60 to -30; the slope thresholds for the right bound¬ 
ary are from 30 to 60. If the line through LPoints 
passes the left slope threshold filter, it is taken to be 
the left boundary of the vehicle’s lane. If the line 
through RPoints passes the right slope treshold filter, 
it is taken to be the right boundary of the vehicle’s 
lane. 


Algorithm 1 detectLaneBoundaries( Img) 

ROI crop ROI (Img); 

ROI convert To Grayscale (ROI); 

ROI <— gaussianBlur( ROI); 

ROI <- thresholdOTSU( ROI); 

LPoints, RPoints <— greedyHaarSpiker(ROI); 
LeftLaneBoundary <— fitLine (KOI, LPoints); 
RightLaneBoundary fitLine (KOI, RPoints); 
return LeftLaneBoundary, RightLaneBoundary; 


Algorithm 2 greedyHaarSpiker(KOI, s r , e r , A) 

LLPoints <— []; 

RPoints []; 

LSpike <- NULL; 

RSpike <- NULL; 
r <— s r ; 

while r > e r do 

LLine <— getLeftScanLine( ROI, r, LSpike); 
RLine getRightScanLine( ROI, r, RSpike); 

LHWT <- ordHWT (LLine); 

RHWT <- ordHWT (RLine); 

LSpike <— detectSpike( LHWT); 

RSpike <— detectSpike( RHWT); 
if LSpike ^ NULL then 
LPoints. add (LSpike. g etMidP oint Of Climb ()); 
end if 

if RSpike / NULL then 
RPoints. add (RSpike. getMidPoint Of Climb ()); 

end if 

r r + A; 

end while 


The procedure greedy HaarSpiker in Algorithm 2 
takes a preprocessed ROI and three integer parame¬ 
ters s r , e r , and A. The parameters s r and e r (s r > e r ) 
specify the start and end rows, respectively, in the 
ROI where the spikes are detected. The parameter 
A specifies a step value, a small negative integer, for 
generating the exact row numbers where spikes are 
detected. For example, if the algorithm is to detect 
spikes in the row range [50,40], i.e., s r = 50 and 
e r = 40, with A = —2, the sequence of rows that 
will be considered is (50,48,46,44,42,40). Note that 
the spike detection starts from the lower rows that 
are closest to the vehicle and moves up to the the 
rows that are further away from the vehicle. 

The variables LPoints and RPoints contain the 
(x,y) tuples returned to detectLanes after linear re¬ 
gression line fitting. The variables LSpike and RSpike 
contain two spikes detected in the ordered HWTs of 
the left and right scanlines, respectively. 

In the while-loop of greedy HaarSpiker, two scan¬ 
lines, LLine and RLine, of 64 pixels each are chosen 
on the left and right sides of the ROI in row r. The 
scanline’s length, i.e., 64, can be changed through a 
global variable but it has to be equal to an integral 
power of 2. A value of 64 was experimentally found 
to result in optimal performance. 

If the value of LSpike is NULL, the left scanline 
starts at column 0. Similarly, if the value of RSpike 
is NULL, the right scanline starts at column w — 1, 
where w is the width of the ROI, which, in the current 
implementation, is equal to 200. If the value of LSpike 
is not NULL, which means that a spike was detected 
in the previous row, the left scanline, saved in the 
LLine variable, is centered on the middle of the two 
ordinates of the detected spike’s climb segment, i.e., 
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the ordinates of u s and u e in equation 5. The flat and 
down segments are currently not taken into account 
in the algorithm. The right scanline is detected and 
saved in RLine in the same way except that the spike 
saved in RSpike is used. In Fig. 7, the scanlines are 
shown as horizontal white lines on the left and right 
sides of each row. As row number r apporoaches the 
upper boundary of the ROI (i.e., e r ) the gap between 
the left and right scan lines becomes smaller. 

The procedure detectSpike in the while-loop of the 
procedure greedyHaarSpiker uses thresholds for the 
angles of the climb, flat, and decline spike segments, 
i.e., <a, 7 , /? in (5), and returns the leftmost spike that 
clears the thresholds. In the current implementation, 
a = /3 = 60° and 7 = 5°. In other words, the spikes 
whose climb or decline angles are less than 60° are 
filtered out, and flat segments are detected so long as 
consecutive wave coefficients fluctuate within ±5° of 
0. This algorithm is greedy in that it always returns 
exactly one leftmost spike in each left scanline and 
exactly one leftmost spike in each right scanline. All 
other spikes are ignored. If no spikes clear the angle 
thresholds, the value of NULL is returned. 

When while-loop of greedy HaarSpiker finishes, 
the lists LPoints and RPoints contain (x,y) tuples 
representing the mid points of the climb segments of 
spikes detected in the left and right scanlines in each 
of the processed rows. These points are used by the 
procedure fitLine in detectLanes to fit lines through 
them. The lines identify the left and right lane bound¬ 
aries, as shown in Fig. 8 . 

5 Experiments 

The images for the experiments were captured with 
the hardware shown in Figures 3, 4, and 5. To ob¬ 
tain experimental video data for daytime driving, the 
author drove a 2016 Jeep Wrangler on two different 
days in September (run 1) and November 2016 (run 

2) at a speed of 55-60 miles per hour on Route 30, 
a two-lane Northern Utah highway with marked lane 
boundaries. On each run, the raspberry pi camera 
unit was turned on to record the video and save it 
on the raspberry pi’s sdcard. The first run was on a 
sunny day with clear skies and good visibility. The 
second drive was on a cloudy day with light rain. To 
obtain experimental video data for driving on snowy 
roads and night driving, the author drove the same 
vehicle on the same highway and at the same speed 
on a day in January 2017 after a heavy snowfall (run 

3) and on the same day after sunset (run 4). 

The first three runs were approximately 35 miles 
long. The fourth run was approximately 20 miles long. 
The video from each run was segmented into frames 
and a sample of 360 x 240 consecutive PNG frames 
was selected from it. Samples 1 , 2 , and 3 had 1,000 
frames each selected from the videos recorded in runs 
1, 2, and 3, respectively; sample 4, selected from the 


Table 1: Lane boundary detection accuracy 


SN 

NB = 2 (%) 

NB > 1 (%) 

FP (%) 

1 

61.90 

91.20 

1.60 

2 

34.10 

77.40 

2.70 

3 

16.90 

64.10 

8.30 

4 

15.74 

57.03 

11.48 


run 4 video, included 775 frames. 

The evaluation of the algorithm’s performance was 
done manually by two human evaluators who com¬ 
pared the lane boundaries drawn in each image by 
the algorithm with the actual lane boundaries in the 
same image. Each image thus evaluated was placed 
into one of the three accuracy categories: both bound¬ 
aries detected, at least one boundary detected, and no 
boundary detected. An actual boundary was consid¬ 
ered detected accurately if the boundary line drawn 
by the algorithm was exactly aligned with the actual 
boundary. 

Table 1 shows the accuracy results on all four im¬ 
age samples. The column SN, which stands for sam¬ 
ple number, lists the four sample numbers. The sec¬ 
ond column, NB = 2, shows the percentage of frames 
where the number of detected boundaries (NB) is ex¬ 
actly 2, i.e., both boundaries are detected. The third 
column, NB > 1, shows the percentage of frames 
where at least one boundary was accurately detected. 
The fourth column gives the percentage of false posi¬ 
tives (FP) in each sample. 

In sample 1, both boundaries were accurately de¬ 
tected in 61.9% of the frames and at least one lane 
boundary was detected in 91.20% of the images. The 
percentage of false positives in sample 1 is 1.6%. The 
detection results in sample 2 were 34.10% for both 
boundaries and 77.40% for at least one lane bound¬ 
ary. The percentage of false positives in sample 2 was 
2.70%. In sample 3, taken on a winter day, the per¬ 
centage of both lanes detected dropped to 16.90% and 
the percentage of at least one lane detected decreased 
to 64.10%. The percentage of false positives in sarnie 3 
increased to 8.30%. Finally, in sample 4, the lane iden¬ 
tification performance was the worst. Specifically, the 
percentage of both lanes detected was 15.74%, the per¬ 
centage of at least one lane detected was 57.03%, and 
the percentage of false positives increased to 11.48%. 

Fig. 9 illustrates an inaccurate boundary detec¬ 
tion and a false positive from sample 1. While the left 
lane boundary, denoted by a small bright green line, is 
accurately aligned with a real boundary, it is aligned 
with the boundary of the opposite lane. This mis¬ 
alignment shows a problem with the greedy approach 
in that the algorithm always chooses the leftmost spike 
in each scanline. The red line, almost perpendicular 
to the bright green line, is a false positive. 

Fig. 10 shows two frames from run 1 taken on 
a sunny day. The left image shows both boundaries 
detected accurately. The right image shows only the 
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Figure 9: Run 1: a short green line is inaccurately 
aligned with a wrong boundary; a red line is a false 
positive. 



Figure 10: Run 1: both lanes recognized (left); only 
the left boundary recognized (right). 



Figure 11: Run 2: both lanes recognized (left); no 
left boundary recognized and a false right boundary 
(right). 



Figure 12: Run 3: both lanes recognized (left); a false 
left boundary and a correct right boundary (right). 



Figure 13: Run 4: both lanes recognized (left); neither 
lane recognized (right). 


left boundary (green line) detected accurately while 
the right boundary is not detected at all. The fail¬ 
ure to detect the right boundary was caused by the 
shadow of a semitruck in the opposite lane. 

Fig. 11 shows two frames from run 2 taken on a 
cloudy day. The left image shows both lanes accu¬ 
rately recognized. In the right image, the left bound¬ 
ary is not detected and the right boundary is detected 
inaccurately. This is also a case of a false positive. 

Fig. 12 shows two frames from run 3 taken after 
a heavy snowfall. The left image shows both lanes 
accurately recognized. In the right image, the left 
boundary is detected inaccurately and the right lane 
is accurately recognized. There were many instances 
of detection failures because the lanes were covered by 
snow. 

Fig. 13 shows two frames from run 4 taken at 
night after a heavy snowfall. The left image shows 
both boundaries detected accurately. It is interesting 
to note that the lane detection recognition was better 
when there were cars in the opposite lane with their 
lights turned on. The right image shows a frame where 
neither boundary was detected. 

6 Conclusions 

In this article, an algorithm was presented for in situ 
vision-based detection of highway lane boundaries on 
a raspberry pi computer coupled to a raspberry pi 
camera. The algorithm, called Greedy HaarSpiker, is 
based on the detection of ID Haar Wavelet spikes in 
ID Ordered Haar Wavelet Transforms of image rows. 

The position advocated in this article is that, in 
order to be economically viable and broadly share¬ 
able, vision-based lane detection algorithms should be 
implemented and tested in situ on off-the-shelf low- 
voltage hardware platforms such as the raspberry pi. 
The creation of replicable hardware and software solu¬ 
tions will enable citizen science drivers to build, test, 
and broadly share replicable driver safety enhance¬ 
ments. 

To address the lane detection problems described 
in Section 5, several improvements are being consid¬ 
ered. Recall that, as explained in Section 4, the flat 
and down segments are currently not taken into ac¬ 
count in the algorithm. Thus, the first improvement 


15 








































Graphics, Vision and Image Processing Journal, ISSN 1687-398X, Volume 17, Issue 2, ICGST LLC, Delaware, USA, Dec. 2017 


is to use not just the climb segment of each detected 
spike but also the flat and decline segments when com¬ 
puting the 2D points of a potential line either on the 
left or on the right. The second improvement is to 
use a different curve fitting algorithm, e.g., a high or¬ 
der polynomial, to find the best line that fits a set 
of points. A potential drop in the number of frames 
processed per second may be compensated by more 
accurate lane boundary detection. The third improve¬ 
ment is to add geometrical constraints to reduce the 
number of false positive. 


Acknowledgements 

The author is grateful to Vikas Reddy Sudini for help¬ 
ing him evaluate the algorithm’s performance. 


References 

[1] S. Thrun. Toward Robotic Cars. Communica¬ 
tions of the ACM , 53(4):99-106, 2010. 

[2] J. Dokic, B. Muller, and G. Meyer. European 
Roadmap Smart Systems for Automated Driving. 
European Technology Platform on Smart System 
Integration, Berlin, Germany, 2015. 

[3] T. Simonite. Data shows google’s robot cars are 
smoother, safer drivers than you or i. MIT Tech¬ 
nology Review , Oct., 2013. 

[4] G. Nelson. Tesla beams down ’autopilot’ mode 
to model s. Automative News , Oct. 14, 2015. 

[5] C. Mui. Will the google car force a choice between 
lives and jobs? Forbes , Dec., 2013. 

[6] T. Lassa. The beginning of the end of driving. 
Motor Trend , Jan., 2013. 

[7] O. Miller. Robotic cars and their new crime 
paradigms. Linkedln Pulse , Sept. 3, 2013. 

[8] M. Ufberg. Whoops: The self-driving tesla may 
make us love urban sprawl again. Wired , Oct. 10, 
2015. 

[9] D. Yadron and D. Tynan. Tesla driver dies in 
first fatal crash while using autopilot mode. The 
Guardian , Jul. 1, 2016. 

[10] V. Mathur. Google autonomous car experiences 
another crash. Government Technology , Jul. 17, 
2015. 

[11] J. Boeglin. The costs of self-driving cars: recon¬ 
ciling freedom and privacy with tort liability in 
autonomous vehicle regulation. Yale Journal of 
Law and Technology , 17(1):Article 4, 2015. 


[12] V. Kulyukin and V. R. Sudini. Real-Time Vision- 
Based Lane Detection on Raspberry Pi with ID 
Haar Wavelet Spikes. In Lecture Notes in En¬ 
gineering and Computer Science: Proceedings of 
the International Multi Conference of Engineers 
and Computer Scientists , pages 75-80, Hong 
Kong, China, March 2017. 

[13] Y. Wang, E. Teoha, and D. Shen. Lane detection 
and tracking using b-snake. Image and Vision 
Computing , 22:269-280, 2008. 

[14] Z. Wang. Robust lane detection and tracking in 
challenging scenarios. IEEE Trans, on Intelligent 
Transportation Systems , 9(1): 16-26, 2008. 

[15] P. Hsiao, C. Yeh, S. Huang, and L. Fu. A 
portable vision-based real-time lane departure 
warning system: day and night. IEEE Trans, 
on Vehicular Technology , 58(4):2089-2094, 2009. 

[16] J. Eriksson and J. Landberg. Lane departure 
warning and object detection through sensor fu¬ 
sion of cellphone data. Master’s thesis in Applied 
Physics and Complex Adaptive Systems, Depart¬ 
ment of Applied Mechanics, Chalmers University 
of Technology. Goteborg, Sweden, 2015. 

[17] R.O. Duda and P.E. Hart. Use of the hough 
transformation to detect lines and curves in pic¬ 
tures. Comm. ACM , 15:11-15, 1972. 

[18] P. Mandlik and A. Deshmukh. Raspberry-pi 
based real time lane departure warning system 
using image processing. International Journal of 
Engineering Research and Technology , 5(6):755- 
762, 2016. 

[19] R. Laganiere. OpenCV 2 Computer Vision Appli¬ 
cation Programming Cookbook. Packt Publishing 
LTD, 2011. 

[20] J.F. Canny. A Computational approach to edge 
detection. IEEE Trans, on Pat. Anal. And Mach. 
Intel. , 8:679-688, 1986. 

[21] A. Jensen and A. Cour-Harbo. Ripples in math¬ 
ematics: the discrete wavelet transform. New 
York: Springer, 2011. 

[22] Y. Nievergelt. Wavelets made easy. Boston: 
Birkhaser, 2001. 

[23] S. Mallat and W. Hwang. Singularity detection 
and processing with wavelets. IEEE Trans, on 
Information Theory , 38(2):617-643, 1992. 


16 


Graphics, Vision and Image Processing Journal, ISSN 1687-398X, Volume 17, Issue 2, ICGST LLC, Delaware, USA, Dec. 2017 


Biographies 

Vladimir A. Kulyukin is 

an Associate Professor of Com¬ 
puter Science at Utah State Uni¬ 
versity. He holds a Ph.D. in Com¬ 
puter Science from the University 
of Chicago that he received in 1998. 
His research interests include AI, 
computer vision, and sensor fusion. 




Graphics, Vision and Image Processing Journal, ISSN 1687-398X, Volume 17, Issue 2, ICGST LLC, Delaware, USA, Dec. 2017 


18 


Graphics, Vision and Image Processing Journal, ISSN 1687-398X, Volume 17, Issue 2, ICGST LLC, Delaware, USA, Dec. 2017 



1CS8T 


www.icgst.com CfVIP 


Video Compression using Efficient Encoding Techniques for Low Bit Rate 

Applications 

Poorva Waingankar 1 , S.M. Joshi 2 

l.Assoc.Professor, Thakur College ofEngg. &Technology, Mumbai, India. 

2.Professor, Vidyalankar Institute of Technology, Mumbai, India. 
p waingankar@gmail. com, 
http ://www.tcetmumbai,in 


Abstract 

This paper presents use of Accordion technique along 
with modified Run Length Encoding for video 
compression, which consists of exploiting the high 
amount of temporal redundancies present in videos by 
converting them to spatial redundancy and using 2D DCT. 
The Video compression steps are either optimized or 
completely revamped to meet the compression and video 
quality requirement in mobile application. This technique 
is less complex to suit lower end CPUs and achieves a 
very good compression ratio to suit the narrow bandwidth 
environments of wireless networks, without 
compromising on the quality of the video. 


Keywords :H. 264, Accordion, Run Length Encoding(RLE), 
Discrete Cosine Transform (DCT), Huffman Encoding 


Nomenclature 

DCT Discrete Cosine Transform 
Mbps Mega Bits per second 
MSE Mean Square Error 
PSNR Peak Signal to Noise Ratio 
QP Quantization Parameter 
RLE Run Length Encoding 


1. Introduction 

The technological development in multimedia industry 
over the past decade has enabled widespread usage of 
internet based applications and smart phones. Out of the 
various types of media, video transmission and reception 
through wireless networks is important in the context of 
the universal access. The increase in communication 
speed, computing power and availability of computer 
storage facilities, has led to a new age of multimedia 
applications. Various applications such as mobile 
messaging, video conferencing, use of social networking 
sites etc. require use of multimedia on large scale 
Although Wireless communications technologies have 
been evolving rapidly, the available bandwidth is still of 
great value and so video coding at ultralow bitrates plays 


an important role in the development of convergent and 
interoperable video based multimedia services. These 
applications need storage of high-quality data, reliable 
transmission and ease of access to content. The volume 
of data generated by digitizing a video signal is very 
large for most transmission systems. Therefore, digital 
video compression is an important aspect in the 
realization of these applications. The demand for quality, 
performance and limitations of available transmission 
capabilities is necessary to be fulfilled by digital video 
compression techniques. An efficient and well designed 
video compression system gives significant performance 
advantages for visual communication at both low and 
high transmission bandwidths. 

The process of transmission and reception of digital 
video from source to its destination involves many stages. 
The most important process is compression (encoding) 
and decompression (decoding). In this the bandwidth¬ 
intensive ‘raw’ digital video is reduced to a manageable 
size for transmission or storage, then reconstructed for 
display. The proper compression and decompression 
process can provide better image quality, greater 
reliability and/or more flexibility. Therefore, researchers 
have keen interest in the continuing development and 
improvement of video compression and decompression 
methods involving various innovative techniques. 

In a typical video, often the temporal redundancies are 
found to be more relevant than spatial one. In the current 
video compression techniques, these redundancies are 
not fully exploited. It is possible to achieve more 
efficient compression by exploiting these redundancies 
in the temporal domain. In most of the techniques motion 
estimation and compensation techniques are usually 
employed to exploit temporal redundancies. It is 
observed that the motion estimation process is 
computationally intensive and its real time 
implementation is difficult as well. Considering current 
trends and developments in multimedia applications over 
internet and mobile communication, an effective 
algorithm which can fully exploit the redundancy would 
help to reduce the overall bit rate of 
transmission/reception. 
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2. Video Compression Fundamentals 

An uncompressed video produces an enormous amount 
of data and need more than 100s of Mbps bandwidth. 
Such amount of data causes extremely high 
computational demands even with powerful computing 
systems. Hence data compression is an important aspect 
for managing such data. There are mainly two categories 
of compression; lossy and lossless. In lossy methods; 
Transform based coding, Vector quantization, block 
truncation etc. are used whereas, Run Length Encoding, 
Huffman Coding, Predictive Coding are used for lossless 
compression. 

The lossless compression retains the original data 
retaining individual image sequences remain the same, 
hence compression rate is smaller in this case. The 
“lossy” compression methods remove image and sound 
information that is unlikely to be noticed by the viewer, 
thereby volume of data is significantly decreased. There 
is always a trade-off between data size and the quality. 
The higher the compression ratio, lower the size and the 
quality too. The encoding and decoding process also 
needs computational resources which need to be taken 
into consideration. The digital video contains a great deal 
of redundancy which is categorized in three types as 
given below: 

• Spatial redundancy, which is due to the 
correlation or dependence between neighbouring 
pixel values 

• Spectral redundancy, which is due to the 
correlation between different colour planes or 
spectral bands 

• Temporal redundancy, which is present because 
of correlation between different frames in videos. 

The spatial redundancy is reduced by registering 
differences between parts of a single frame; this is known 
as intraframe compression and is closely related to image 
compression. Likewise, temporal redundancy can be 
reduced by registering differences between frames; this is 
known as interframe compression, including motion 
compensation and other techniques. Hence for effective 
video compression, both interframe and intraframe 
techniques are used. The typical Video Compression 
system is shown in Figure. 1. 



Figure 1.Typical Video Compression Scheme 


3. Brief Literature Review 

The digital video compression technologies have become 
an integral part of visual information transmitted/received 
through wired and wireless networks over last one and a 
half decades. Various standards have been developed for 
this purpose which define a specific bit stream syntax, 
imposes very limited constraints on the values of that 


syntax, and define a limited-scope decoding process. 
Video codecs are primarily characterized in terms 
throughput of the channel, distortion of the decoded 
video, delay and complexity (in terms of computation, 
memory capacity, and memory access requirements). 
The intent is for every decoder that conforms to the 
standard to produce similar output when given a bit 
stream that conforms to the specified constraints. Thus, 
these video coding standards are written primarily only 
to ensure interoperability (and syntax capability), not to 
ensure quality. This limitation of scope permits maximal 
freedom to optimize the design of each specific product 
(balancing compression quality, implementation cost, 
time to market, etc.). It provides no guarantees of end-to- 
end reproduction quality, as it allows even crude 
encoding methods to be considered in conformance with 
the standard [1][2]. 

To obtain highly compressed videos without 
compromising visual quality and to make cost 
performance trade-offs best suited to applications, 
researchers have proposed different methods. The multi¬ 
objective optimization technique used as a mean for 
multi-criteria decision making [3]. In which quantization 
Parameter (QP) controls the tradeoff between quality and 
bit rate in the sense that a QP increment by 1 results in 
12.5% reduction of bit-rate. For network related 
constraints, optimization algorithm referred to as the 
Network State Dependent Video Compression Rate 
(NSDVCR), which determines the compression rates 
depending on the video characteristics and the network 
condition is proposed [4]. 

The possibility of dynamic frame skipping to achieve 
even higher video compression for low bit rate 
applications less than 16KbpS is explored by researchers 
[5]. Motion compensation is very important step in video 
compression ,so by using control grid interpolation for 
block based motion compensation, like other interframe 
compression techniques, produces an approximation of a 
frame by reusing data contained in the frame's 
predecessor[6] and in another technique i.e. overlapped 
block motion compensation is proposed that [7], for each 
block in the current frame a matching block is found in 
the past frame and if suitable, its motion vector is 
substituted for the block during transmission. Depending 
on the search threshold some blocks will be transmitted 
in their entirety rather than substituted by motion vectors. 
The problem of finding the most suitable block in the 
past frame is known as the block matching problem. 
Videos with less motion elements contain high level of 
temporal redundancy. To avoid the complex 
computational step of motion estimation and 
compensation, a new low complexity DCT based video 
compression method is proposed where Accordion 
representation converts 3D video content by a 2D image, 
which allows exploiting the redundancy for high 
compression [8]. 

In the subsequent section, use of Accordion 
representation along with improved Huffman dictionary 
and modified RLE for Video is presented, 
incorporated into the model, it can be shown that 
significant improvements in the performance of the 
algorithm can be realised. Moreover, the simplicity and 
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the efficiency of dynamic pose tracking techniques 
succeeded to improve the robot pose estimation process. 

4. Proposed Methodology 

The video signal has high temporal redundancies due to 
the high correlation between successive frames. It is 
possible to achieve more efficient compression by 
exploiting more and more the redundancies in the 
temporal domain. The proposed method consists of 
projecting temporal redundancy of each group of pictures 
into spatial domain to be combined with spatial 
redundancy in one representation with high spatial 
correlation i.e. by using Accordion representation. The 
Accordion representation provides a symmetric encoder- 
decoder design, avoiding the motion compensation step 
and reduces blocking artifacts. The Accordion 
representation of any video acts like a preprocessing 
technique for DCT to achieve a very good amount of 
energy compaction. The flow chart of an implementation 
of proposed algorithm is shown in Figure.2. 



Figure 2: Flow Chart of algorithm 


Initially for a video frame, Sub-sampling is implemented 
by calculating the average pixel value for each group of 
several pixels, and then substituting this average in the 
appropriate place in the approximated image. In general, 
whenever sub sampling is done at the encoder, the 
decoder has to reconstruct the original picture with some 
approximation by using a technique called pixel doubling. 
But in mobile based applications, since the screen 
resolution is less due to small size, this pixel doubling 
step is avoided which reduces the decoder complexity. 
After considering various factors like the compression 
percentage, the computational complexity and picture 
clarity, bilinear interpolation method is found to give the 
best performance in picture clarity with moderate 



complexity. After being read into matrix, the input video 
is divided into several groups with each group consisting 
of N number of frames where N is the number of frames 
played in the video per second i.e. fps of the input video. 
This group having similar temporal frames are gathered 
into one stretched frame (2 dimensions) by reading each 
column of every frame subsequently. 

The final step consists of coding the obtained frame. The 
image obtained from the previous steps is now divided 
into blocks of size 8x8, which are then transformed 
using an 8 x 8 forward DCT. The top-left coefficient in 
the 2-D DCT array is referred to as the DC coefficient 
and is proportional to the average brightness of the 
spatial block. The low-frequency coefficients in the top- 
left comer of the array have larger values than the 
higher-frequency coefficients. The transform coefficients 
are then quantized as per their statistical properties. Most 
of the energy is concentrated in the low frequency 
coefficients and hence the higher frequency coefficients 
which are the least important are harsh quantized or 
forcibly reduced to zero to avoid any further processing. 
The Quantization table is designed to provide the most 
visually correct reconstruction Image. It is designed 
according to the perceptual importance of the DCT 
coefficients under the intended viewing conditions. The 
quality and bit rate of an encoded image can be varied by 
changing this array. The quantization of AC coefficients 
creates many zeros, especially at higher frequencies 
which can be coded efficiently. 

The following relation is used for quantization. 

QDCT = round [(8*DCT)/scale*Q) (1) 

Where, DCT is the DCT coefficients, Scale is the scaling 
factor, Q is the corresponding element of the 
quantization matrix. 

The 2-D array of the DCT coefficients is now 
formatted into a 1-D vector using a zigzag reordering. 
Hence the 8x8 DCT matrix is now converted to a one 
dimensional array of 64 coefficients. These 64 numbers 
are collected by scanning the matrix in zigzag fashion. 
This rearranges the coefficients in approximately 
decreasing order of their average energy (as well as in 
order of increasing spatial frequency) with the aim of 
creating large runs of zero values since it produces a 
string of 64 numbers that starts with some non-zeros and 
typically ends with many consecutive zeros. These runs 
of zeros are further compressed efficiently using the 
modified run length encoding procedure. 

When the two DC coefficients belonging consecutive 
DCT matrices have a large difference every such unique 
difference leads to one unique symbol in the Huffman 
dictionary in turn leading to many code words which 
defeats the purpose of compression. To resolve this issue, 
difference between thee coefficients is coded digit wise 
with ten unique symbols, thereby code words 
consequently leading to a much smaller Huffman 
dictionary. This approach has tremendously reduced the 
dictionary size and increased the compression ratio. 

While carrying out the compression for different videos, 
it is observed that apart from the number zero, there are 
very few symbols which have frequent repetitions and 
hence conventional RLE is not suitable here. This 
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problem is resolved in the following manner. After 
analysing the input stream of quantized DCT coefficients 
in the modified RLE, 

a. There is no Run Length Encoding for non-zero 
elements 

b. RLE for all zeros encountered until the last non-zero 
element 

c. Once the last non-zero element is encountered, all the 
remaining zeros are replaced by special end-of-block 
(EOB) code with 2 ‘Zeros’. 

Upon reception of the EOB signal, the receiver 
automatically sets all the remaining coefficients along the 
zigzag scan to zero. For decoding bit stream, exactly 
reverse process is carried out step by step. Once the 
Accordion frame is reconstructed, the MSE and PSNR 
which are the metrics for reconstructed video quality 
were calculated using the following relations. 

PSNR = 10 logio (Max 2 /MSE) (2) 

Where Max is the maximum possible intensity in the 
image (e.g. 255 for a sample precision of 8 bits), and the 
Mean Square Error (MSE) is given by: 

1 m-l n-l 

MSE= —ZZ[ I ( i >j)- K ( i >i)] (3) 

11111 i=0 j=0 


Where the number of rows and columns in the image are 
m and n respectively. 

I(ij) is the intensity of a pixel at position (i,j) in the 
original Accordion image, while K(i,j) is the value of the 
corresponding pixel in the compressed and reconstructed 
Accordion image. The compression in percent is given by; 


%C = 


(Size of Ori. Video-Size of Compressed Video) 
Size of Ori. Video 


( 4 ) 


5.Results 

After applying Accordion principle to frames of input 
video, a stretched frame is formed as shown in Figure 3. 
This is constructed from 4 sample frames. It can be 
observed that the temporal redundancies present among 
the four sample frames is converted to spatial 
redundancies in the resulting Accordion frame. This step 
acts as the preprocessing tool to make the 2D DCT very 
efficient. 




After applying 2D-DCT and quantization, it is observed, 
that the dictionary size reduces to a great extent by using 
modified RLE and efficient handling of DC coefficients, 
which is shown in Figure 4. Table 2 shows that for the 
case of 10 frames, the average length of code words 
reduces from 3.7375 in conventional technique to 2.877 
in improvised technique. 

Table 2. Codeword-length with improved RLE and DC 


Average Length of Code words 

Number of 

Frames 

2 

4 

6 

8 

10 

RLE & DC 

3.6556 

3.6353 

3.6893 

3.7314 

3.7375 

RLE 

&Improved DC 

3.1162 

3.1919 

3.1603 

3.2349 

3.2021 

Modified RLE 

& DC 

2.9048 

2.8795 

2.853 

2.8981 

2.8777 



Figure 4. Dictionary Size for RLE and DC Coefficients 


Compressed Video Size 



Figure 5. Comparison of compressed Video Size 

Finally, the reduction in the size of the compressed video 
by using the proposed algorithm can be observed from 
Figure 5. The comparison based on PSNR is shown in 
Figure 6. It is very much evident that in spite of using 
different techniques to increase the compression, there is 
no or very little change in the PSNR of the reconstructed 
video and it is maintained at around 48 dBs. This 
indicates that the reconstructed video is of very good 
quality. 


Figure 3. Stretched Accordion Frame 
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Figure 6. PSNR comparison 

Further,the scale of quantization was increased from 1 to 
5 for 15 frames of input video.The result of varying the 
quantization scale is depicted in the table 3(a)and(b).It is 
observed that by increasing the scale of quantization, the 
bit stream and the dictionary size of the compressed 
video reduces considerably while maintaining a good 
PSNR. 


Table 3.(a) PSNR vs Quantization Scale 


PSNR 

Quantization Scale=l 

48.5058 

Quantization Scale=2 

47.0732 

Quantization Scale=3 

46.0427 

Quantization Scale=4 

44.0471 

Quantization Scale=5 

42.1601 


Table 3(b) Bitstream vs Quantization Scale 


Bit Stream Size 

Quantization Scale=l 

98339 

Quantization Scale=2 

89466 

Quantization Scale=3 

83506 

Quantization Scale=4 

77998 

Quantization Scale=5 

72108 


Since the PSNR is in the acceptable range for even the 
quantization scale of 5, depending on the required picture 
quality one can chose the scale and the compression ratio. 
In the next section, conclusive remarks are given. 


Huffman dictionary. All the algorithms are developed in 
MATLAB environment. On comparing the conventional 
techniques and the proposed algorithm, a significant 
reduction of 60% in size of Huffman dictionary and 25% 
reduction in code word length are found by processing 
the DC components in this unique way. This results in a 
significant reduction in the size of compressed video 
while maintaining the PSNR at the same level (around 
48db). The subjective quality of video is observed by 
varying the quantization scale. The quantization scale is 
varied from 1 to 5, and it has been observed that even 
with a scale of 5 the reconstructed video is of good visual 
quality. This technique can be effectively used for slow 
moving objects such as video conferencing, surveillance 
etc. However, the rest of optimization techniques will 
yield a significant additional compression without losing 
the video quality measured in PSNR. 
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6. Conclusions 

In this paper, use of Accordion technique for video 
compression is presented. This technique consists of 
exploiting the high amount of temporal redundancies 
present in videos by converting them to spatial 
redundancy and using 2D DCT. Also, the conventional 
approaches related to Zigzag processing and Run Length 
Encoding are re-designed to get a further optimized 
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Abstract 

Vision based methodologies provides a more natural and 
proficient result when contrasted with traditional strategies 
which have been utilized for hand gesture recognition. In this 
paper, we proposed a video based hand gesture recognition. 
Our approach commences by acquiring the video frame from 
a source and converting it into 2D binary frame using YCbCr 
color space. We implemented opening and closing operations 
to filter the noise from the frame. In order to track and 
segment the hand gesture we used Kalman filter and convex 
hull along with convexity defects for detecting hand regions 
from the frame. Our framework can perceive six kinds of 
hand gestures at present time. 

Keywords: Computer Vision, Convex Hull, Convexity Defect, 
Kalman Filter. 

Nomenclature 

SVM Support Vector Machine 

CNN Convolutional Neural Network 

SAD Sum of Absolute Differences 

HOG Histogram of oriented Gradients 

PCA Principal Component Analysis 

LDA Linear Discriminant Analysis 

1. Introduction 

Gesture recognition is a process of deciphering and 
comprehending the human gestures by implementing various 
algorithms. Gesture Recognition has been an area where a 
colossal measure of examination has been done which has 
numerous applications. An assortment of methodologies has 
been proposed for the procedure of gesture recognition. Data 
glove based methodology makes the utilization of sensor 
gadgets for digitalization of both hand and additionally finger 
movements into multi parametric information. Movement 
construct hand division approaches depend with respect to the 
supposition that the elements vital for gestures will be 
connected with gestures. Vision-based methodologies share 
the issue identified with the caprices of low-level division. 
Most of the image processing techniques are in light of two 
fundamental techniques: machine learning and rules. 

A vision based hand gesture recognition system is proposed 
in [1] which uses scale space highlight discovery. In this work 
the first step is to make use of a specific hand gesture in order 
to detect the hands followed by tracking. The segmentation 
of hands is done using color cues and motion. Finally a scale 
space feature detection technique is used for integration in 
recognition of gestures. Jesus et al, in [2] examines depth 
based band gesture recognition. The point has been to 


highlight the gesture classification strategies and additionally 
hand restriction techniques. Here a detailed study of 37 papers 
have been made for comparing various depth based gesture 
recognition systems on the basis of various aspects like hand 
localization, the effects of low cost Kinect, OpenNI software 
libraries and so on. A video based hand gesture recognition 
method has been implemented in [3]. The work focuses on 
recognition of hand gestures on a video stream. The proposed 
system focuses on two procedures namely the hand gesture 
detection and hand gesture recognition. The hand detection 
begins by locating the hands in the video frames with the help 
of blue rectangles by implementing Viola Jones technique. 
The hand gesture recognition begins with the Hu invariant 
moments feature vectors which are extracted from the 
detection of hand gestures and then trained and classified 
using SVM. 

Another methodology is proposed in [4] utilizes modified 
census transform to highlight extraction process for gesture 
recognition. The claim to fame of the transform is that it is 
enlightenment invariant. Finally, a direct classifier is used for 
recognizing hand gestures. A video based hand gesture 
recognition technique is suggested in [5]. Initially a user hand 
gesture video is captured and stored in the hard disk. The 
videos captured are read by the system one by one and 
converted in the form of binary images. Then a 3D Euclidian 
space is created of the binary values obtained. For the training 
a feed forward neural network training method and for 
classification back propagation neural network is used. In [6], 
gesture recognition method is proposed which uses feed 
forward neural networks alongside back propagation for 
classifying the extracted features. The work compares various 
hand gesture recognition techniques by making the use of 
MATLAB. The use and implementation of skin detection and 
edge detection algorithms are also studied. Reference in [7], 
concentrates on the utilization of CNN for hand gesture 
recognition by making use of images captured by camera. To 
make the system robust, calibration of hand position, 
orientation and skin model are applied for obtaining the 
training as well as testing data for CNN. The Gaussian 
mixture model algorithm is used for training of the skin 
model. The calibrated images so obtained are used for the 
purpose of training the CNN. 

Xianghua Li proposed thinning method which involves SAD 
to compute matching regions [8]. A depth map is 
implemented in the portion of hand detection that makes the 
use of sum of absolute differences technique for detection of 
the object located in foreground. The frame is converted into 
YCbCr space and then convex hull is computed to extract 
region of interest. The background image in the obtained 
region of interest is removed so that the foreground image can 
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be received that is hand image. A blob labeling method 
algorithm is used for obtaining the clear hand image. The 
feature point extracted using thinning algorithm is used to 
recognize hand gesture. Similar approach is used by Amiraj 
in [9], uses convex hull and convexity defects to count the 
number of fingers in video. The primary step is to capture the 
video and use it as an input for the system. The video is 
converted into frames and thresholding is applied to separate 
the hands from the background. Contours are used to find out 
the location of hands in the video frames. The algorithms like 
convex hull and convexity defects are implemented for 
detection and extraction of hands from the input. Then by 
making the use of various rules the hand gestures are 
classified. In [10], proposed automated method to recognize 
hand gestures in varying backgrounds. Skin color detection 
method has been used to figure out the hand region from the 
complex background. A series of morphological operations 
are implemented to extract the contour which is used to 
recognize finger tips. The angle of the fingertips is used for 
marking the fingertips. The technique shows the accuracy of 
the system with low computation cost. Yafei used HOG 
transform to extract hand features which are then reduced to 
9D sub space using PCA-LDA [11]. The hand regions are 
finding out by combining an adaptive skin color detection 
algorithm along with the motion detection. The distance 
between the features of projections and each class of gesture 
is calculated. The extracted features are then classified using 
nearest neighbor to identify the gesture. The use of hands 
instead of mouse as an input appears to be an instinctive 
choice for man machine interaction. 

In this paper, we used convex hull and convexity defects to 
describe the hand gestures. The hands are firstly detected 
using skin color and various morphological operations are 
used to extract the features using convex hull properties. For 
the purpose of tracking, Kalman filter is used to track the 
location of hands in the video frames. The classification is 
done on the basis of the specified rule set. Finally the results 
of the proposed technique have been tabulated which 
indicates the precision of the system. 

2. Hand Detection 

In order to locate the hand gesture in a video frame efficiently, 
skin color detection and region of interest are computed. 

Skin Color Detection 

Skin color detection is a procedure of identifying the region 
of interests within the spectrum of skin colored pixels in an 
image or a video frame. This methodology is utilized in 
various approaches which incorporate distinguishing a face, 
object, hand, etc. in diversified expanses. 

Due to vacillating background conditions & luminance 
components, we erected our skin color model in YCbCr color 
space in order to approximate the chromaticity of skin. This 
computation involves conversion of RGB to YCbCr color 
space and eliminating the luminance component to compose 
the skin color more robust to illumination. The histogram of 
the resulting 2D color vector has produced the region of 
interest which shows a strong peak at the skin color. This 
conversion step is explained using a diagram as represented 
in figure 1. 

The YCbCr conversion of a given pixel from RGB can be 
deduced by the following matrix I: 


Y 


0.299 

-0.1689 

0.4998 ' 

Cr 

= [R G B] 

0.587 

-0.3317 

0.4185 

.Cb. 


.0.114 

-0.5006 

-0.0813- 



Figure 1: Stepl: Conversion of RGB to YCbCr color space. Step 
2: Separating Y, Cb and Cr components from YCbCr frame. 


3. Motion Detection 

To the resultant 2D gray scale color vector, morphological 
transformations are performed. We initiated the process by 
thresholding the grayscale framework. This method 
reorganizes a grayscale image to a bi-level image and extracts 
the pixels representing the hand or an object. A median filter 
with a kernel 15 x 15 is used to filter the noise from the 
resulted frame. A combination of morphological operations 
which consist of binary opening and closing, are applied over 
the image to suppress the remaining noise using a square 
kernel. 



(d) (e) 

Figure 2: a) Grayscale Image, b) Threshold Operation, c) Median Filter 
Operation, d) Opening Operation, e) Closing Operation. 


The opening of image I by kernel H can be computed as: 
(7oH) = (/ (1) 

To the resulting frame, we performed thresholding operation 
to acquire an optimal frame for computing hand gesture 
features. A series of morphological operations implemented 
over the video frame is shown in figure 2. 
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Figure 3: Sequence of frames extracted from video using Kalman Filter. 




Hand Tracking 

To track hand gestures in real time, we implemented an 
optimal estimation framework catered by Kalman filter which 
is extensively adopted for tracking objects because of its 
small computational requirements, elegant recursive 
properties, uncertainty analysis and prognosis of subsequent 
frames [12] [13]. In this paper, Kalman filter is employed to 
predict the location of hand gesture in a frame. The Kalman 
filter follows a two-step procedure for hand tracking, that are 
control and measurement update. The control update can be 
used for estimation of the state with the previous state and 
vector, while the measurement update is used for correcting 
the sensor information based upon the state. To finally predict 
the position of hand in the frame, we blend the Gaussian 
results produced from prediction and measurement to obtain 
the position of the hand as shown in figure 3. 

4. Projection into Palm Plane 

To project into palm plane, contours are configured around 
the black bead of the hand developed after segmenting the 
frame. It is possible that system might detect multiple 
contours which are produced due to noise in the background. 
An assumption is made that contours produced by the noise 
are smaller in size compared to contour of the hand. 
Therefore, we scrutinized the biggest contour in the frame 
which is used for further processing. This method thus 
removes the possibility of considering any contour formed 
due to noise. 

Convexity Detection 

The final approach of our system is to detect convexity points 
from the extracted contour. This methodology endeavors to 
detect convex hull and convexity points from the contour. The 
convex hull illustrates the extrinsic contour of the hand such 
that all the contour specks are within the convex hull. 

To extract the convex hull, we approximated the hand contour 
with a minimum parameter polygon resulting in dwindling of 
undesirable convexity specks. We used Douglas-Peucker 
algorithm for smoothing the boundary which recursively 
joins first and last vertices of the polygonal line segment by 
finding the vertex furthest from it. 

To estimate convex hull points of the approximation polygon, 
we implemented a simple and intuitive Sklansky’s algorithm. 
This graph based algorithm is based on stack, which in the 
extreme includes the vertices of the convex hull. It considers 
three vertices: top stack vertex, new vertex, second to top of 


the stack vertex. The top stack vertex is rebuffed if trio forms 
a right turn. 

Convexity defects are computed by measuring distance 
between the farthest point and convex hull. The resulting 
frame is filtered by rejecting the convex points which are not 
present near finger tips. This is done by computing the 
centroid of enclosed polygon. If any convex point whose 
height is less, then height of the center of the palm was 
filtered out. 

Hand Gesture Recognition 

This application is developed to identify the number of 
fingers operating in a hand gesture. To classify the number of 
fingers distinguishable in the frame, we used feature extracted 
from frames and counted the number of convex and convexity 
defect points. Figure 4 indicates the use of convex hull and 
convexity defects to find out the hand points that are needed 
for recognizing hand gestures. 



(a) (b) (c) 

Figure 4: (a) Convex Hull of the frame, (b) Extracted Contour, (c) 
Convex and Convexity Defect Points. 

Finger Counting 

Using polylines drawn around the hand, we computed the 
approximate centroid of the hand. For any of the parameter to 
correctly satisfy the prerequisite, the ‘x’ number of convex 
hull points should lay outside a threshold range from the 
centroid of the hand. In order to recognize the number of 
fingers, one of the following parameters should be satisfied 
as shown in Table 1: 


No. of 
fingers 

Convex Hull Points (jc) 

Convexity 
Defect 
Points (y) 

0 

Exactly 0 

Exactly 0 

1 

Exactly 1 

Exactly 0 

2 

Exactly 2 

At least 1 

3 

Exactly 3 

At least 1 

4 

Exactly 4 

At least 1 

5 

Exactly 5 

At least 1 


Table 1: Condition for recognizing finger counts 


5. Experimental Results 

In this model, there are certain constraints which need to be 
satisfied for recognizing the hand gesture and count the 
number of fingers. The system also maintains the tracks of 
the hand gesture which uses Kalman filter. In figure (5) and 
(6), shows the current working model which can trail the hand 
and recognizes limited number of finger counts. In order to 
find out the classification rate of the system a set of 20 videos 
are used for each hand gesture. The aim was to ensure that the 
arrangement of videos contain enough data with a specific 
end goal to depict a specific hand gesture. 
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The set involves videos which delineates a solitary hand 
performing gestures where hand ought to possess the 
significant locale. Table 2 indicates the classification results 
of the system. 




Figure 6: Hand Tracking in Binary Video Using Kalman Filter. 


6. Conclusion and Future Work 

In this paper, we presented a vision-based hand gesture 
recognition system which operates on real time videos on an 
average PC using low cost cameras. The proposed method is 
currently used to count limited number of fingers with a high 
classification rate under various constraints. The future work 
involves recognizing multiple hands in a given frame, a 
rotation and orientation independent gesture recognition and 
a more efficient and flexible man-machine interaction which 
can be used in real life applications. 
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Abstract 

Encryption is one of the methodologies used to maintain 
and protect the data confidentiality. As per the user data 
type’s requirements, users need to adopt and implement 
any one of the existing methods. But those encryption 
methods and standards may not be bound within the user 
data country regulations, when the users are from 
different geographical locations. Some of the existing 
methods are already compromised by hackers and also 
some of the government agencies are forcing their 
country based service providers to provide the encrypted 
information in the name to maintain the country’s 
security. It is very difficult to manage the threats with 
one method. The proposed method tried its maximum 
level to reduce the threats by using different points of 
view. In this proposed method images and the block- 
based encryption method have been used to protect the 
normal and sensitive image from the unauthorized access. 
The proposed method is tested on all proposed encryption 
types using greyscale in two scenarios. They are 
Different Images One Type (DIOT) and Single Image All 
Types (SIAT). The results of the proposed methods are 
evaluated using PSNR, MSE, Size of the Image and 
Histogram to verify the image’s integrity. 1 

Keywords: Image Encryption, Decryption, Image 
Security, Greyscale Images, Cloud Security. 


Nomenclature 

SCDSPM 

E&DGM 

MDE & DPM 

PRA 

PRRA 

PSA 

PRSA 

DIOT 

SIAT 


Secured Cloud Data Storage Prototype 
Model 

Encryption & Decryption Gateway 
Model 

Multi-Dimensional Encryption & 

Decryption Model 

Pixel Rearrange Algorithm 

Pixel Reverse Rearrange Algorithm 

Pixel Shuffling Algorithm 

Pixel Reverse Shuffling Algorithm 

Different Images One Type 

Single Image All Types 


^his study has been implemented and Tested on Java 
platform at Department of Information Technology, 
Bharathiar University, Coimbatore, Tamilnadu, India. 


1. Introduction 

"One picture is worth a thousand words", is a popular 
English saying which has been used since 1918, in a 
newspaper advertisement for the San Antonio Light [1]. 
The image conveys the complete information to the 
viewers without any loss of any piece of information. 
Sensitive images must be safeguarded from the general 
viewers and unauthorized viewers in order to protect the 
confidential nature of its contents. For this purpose, 
different encryption standards are applied on the sensitive 
and non-sensitive images to maintain the image 
confidentiality and to prevent that image from being 
mishandled by the unauthorized and unidentified users. 
While coming to the specific objective, the existing 
encryption standards in use are not reliable due to their 
limitations, data processing technique and algorithm 
working architecture methods. Once the images are 
stored online, then the owner of the images automatically 
loses his rights on those images. The online service 
providers are altering their policies in data handling and 
even reformatted the policy related data from time to 
time without any users’ interactions. That service 
provider’s server may be geographically positioned in 
some other vicinity and in that place only the encryption 
and decryption will take place. Once the user encrypts the 
data by using a specific service provider, then the user 
needs to decrypt that data by using that same service 
provider only but it may be done from anywhere because 
they are online. If the user is using the offline encryption 
tools, then the user needs to depend on that device for the 
encryption and decryption, but the user must always keep 
the device with him to perform either encryption or 
decryption whenever necessary. Taking all these things 
into consideration, the Secured Cloud Data Storage 
Prototype Model is designed and the Multi-Dimensional 
Encryption and Decryption Method is one of the modules 
in that. Section 2 reviews the related works concerning 
the encryption techniques to maintain the security of the 
data storage. Section 3 deliberates on the different 
working methodology, procedure, Pseudo code and 
Testing file details of MDE & DPM Algorithm. Section 4 
delineates the implementation, and Section 5 explains the 
experimental results and in addition the features of the 
proposed method are also discussed here. Section 6 
presents the conclusion derived from the findings, the 
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advantages of the proposed algorithm and finally its 
related future enhancements. 

2. Literature Review 

New image encryption design which utilizes one of the 
three dynamic chaotic [2] systems to shuffle the location 
of the image pixels and uses another one of the same 
three chaotic maps to mystify the association between the 
cipher image and the plain-image, thereby considerably 
increasing the resistance to attacks. To overcome this, 
Sakthidasan et al proposed the algorithm with the 
advantage of bigger key space, lesser iteration times and 
high security analysis such as key space analysis, 
statistical analysis and sensitivity analysis [3].Navitha et 
al proposed a very new and combined approach for DCT 
based image compression, pixel shuffling based 
encryption, decryption and steganography for real-time 
applications [4]. Quist et al suggested the sets out method 
to contribute to the general body of knowledge in the area 
of cryptography application by developing a cipher 
algorithm for image encryption of m*n size by shuffling 
the RGB pixel values. The algorithm ultimately makes it 
possible for encryption and decryption of the images 
based on the RGB pixel [5]. Junqin et al introduced a 
permutation-substitution image encryption scheme based 
on generalized Arnold map. Only one round of 
permutation and one round of substitution are performed 
to get the desirable results. The generalized chaotic 
Arnold maps are applied to generate the pseudo-random 
sequences for the permutation and substitution [6]. Lohit 
et al explored the implementation of AES in MATLAB 
on plaintext encryption and cipher text decryption. These 
results are superior to the similar software 
implementations of AES [7]. 

3. Methodology 

The existing methods, updated algorithms are using 
different concepts and implementations of these are 
enough to handle the data encryption process in offline 
mode but not in online mode. In online mode, i.e.in cloud 
[8], the existing methods require more time and utilize 
more resources to perform the encryption and decryption 
process. The geographically distributed data processing 
servers will raise the security breach issues and data 
trans-border related issues. So, the data need to be 
encrypted before the data are transferred from the user 
end to the server end. The proposed method considered 
all of these measures and provides the prototype model 
with different modules to overcome the data related 
storage, retrieval and encryption issues. 

SCDSPM 

The Secured Cloud Data Storage Prototype Model [9, 10] 
contains four sub-modules; they are: 

• Authentication Authorization Resolving Module [11, 12] 

• Data Type Identification and Extension Validation 
Module [13] 

•Encryption and Decryption Gateway Module [14 - 16] 
•Automatic Cloud Data Backup Module [17] 

This paper explains the third module of SCDSPM i.e. E 
& DGM. This E & DGM is redefined with some 
modification and named in this paper as Multi- 


Dimensional Encryption and Decryption Module (MDE 
& DPM). Figure 1 shows the proposed Multi- 
Dimensional Encryption and Decryption Module (MDE 
& DPM). 

Multi-Dimensional Encryption and Decryption Module 
framework 

Using the new type of encryption method will avoid the 
user’s data from superfluous risks. Each and every 
encryption and decryption logics must be uniquely 
different from other methods. In that way, the proposed 
encryption algorithm is using new logic and it will help 
to avoid the unconstitutional access, illicit usage and 
unlawful surveillance of the user’s data by unauthorized 
persons. 



Figure 1. Multi-Dimensional Encryption and Decryption Module 


The proposed Multi-Dimensional Encryption and 
Decryption Module presently concentrated on image 
format files only. This paper explains the proposed 
Multi-Dimensional Encryption and Decryption Module 
with tested standard and non-standard images and its 
related experimental results. It uses 512 x 512 pixel [18] 
images for testing purposes. Multi-Dimensional 
Encryption and Decryption Module contains four 
different algorithms to encrypt and decrypt the image. 
The four algorithms are: 1. Pixel Rearrange Algorithm, 2. 
Pixel Shuffling Algorithm, 3. Pixel Reverse Rearrange 
Algorithm, 4. Pixel Reverse Shuffling Algorithm. 

The above mentioned algorithms are tested with different 
test case images which include standard images and non¬ 
standard images. Figures 2(a) and 2(b) shows the MDE & 
DPM’s encryption and decryption method. 



Figure 2(a). Encryption Method Figure 2(b). Decryption Method 
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Pixel Rearrange Algorithm (PRA) 

In Pixel Rearrange Algorithm the image pixels are 
rearranged into different positions using the 4X4 
matrix concept. The pixel values of the images are 
relocated to other positions from their original positions. 
Once the image pixels are relocated to another position, 
then they automatically reflect in the original structural 
content of the image. Pixel Rearrange Algorithm is 
holding 4096 possible ways to rearrange the image pixels 
into a new position within the selected 4X4 matrix 
method. The result obtained from PRA is incorporated 
into the PSA. 
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Figure 3. Before applying PRA and after applying PRA 


Figure3 shows the image pixel location before applying 
Pixel Rearrange Algorithm (PRA) and also shows the 
image pixel location after applying Pixel Rearrange 
Algorithm (PRA). 

Pixel Reverse Rearrange Algorithm (PRRA) 

The PRRA algorithm is used to reverse the Pixel 
Rearrange Algorithm’s (PRA) relocated pixel values into 
their original position i.e. original location. The reversing 
method will use the rearrange method information from 
the decryption key. 
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Figure 4. Before applying PRRA and after applying PRRA 
Figure 4 shows the pixel location before applying Pixel 
Reverse Rearrange Algorithm (PRRA) and also shows 
the pixel location after applying Pixel Reverse Rearrange 
Algorithm (PRRA). 

Pixel Shuffling Algorithm (PSA) 

The Pixel Shuffling Algorithm (PSA) is used to shuffle 
the pixel values within the matrix value. This research 
work holds sixteen different types of pixel values 
shuffling methods. Within those different methods, one 
of the methods will be automatically (i.e. randomly) 
selected and applied by the Pixel Shuffling Algorithm 
(PSA); then the selected method results will be stored 
with the decryption key. In each and every pixel shuffling 
method, one of the value locations will be fixed as a 
constant to identify which shuffling method is used to 
shuffle the pixel values. Here the decryption key will be 
automatically generated by the PSA algorithm with PSA 
related information and that information will be used at 
the time of decryption. 
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Figure 5. Before applying PSA and after applying PSA 


Figure5 shows the pixel location before applying Pixel 
Shuffling Algorithm (PSA) and also shows the pixel 
location after applying Pixel Shuffling Algorithm (PSA). 
In the selected pixel shuffling method, the pixel value 14 
is fixed as a constant value to identify the shuffling 
method. 

Pixel Reverse Shuffling Algorithm (PRSA) 

The decryption key holds the used Pixel shuffling 
algorithm’s information. By using that information only 
the pixel reverse shuffling algorithm will work. Once the 
Pixel Reverse Shuffling Algorithm (PRSA) gets the 
information from the decryption key, then it will apply 
that correlated reverse shuffling method on that shuffled 
image pixel values. Once the pixel values are reversed, 
then it needs to be processed with the Pixel Reverse 
Rearrange Algorithm (PRRA). Then only the original 
structured content of the image will be constructed. 
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Figure 6. Before applying PRSA and after applying PRSA 
Figure 6 shows the pixel location before applying Pixel 
Reverse Shuffling Algorithm (PRSA) and also shows the 
pixel location after applying Pixel Reverse Shuffling 
Algorithm (PRSA). By using the pixel value 14, which is 
fixed as constant value, is used to identify the shuffling 
method. 

The Pixel Rearrange Algorithm (PRA) and Pixel 
Shuffling Algorithm (PSA) are used to encrypt the image. 
The Pixel Reverse Rearrange Algorithm (PRRA) and 
Pixel Reverse Shuffling Algorithm (PRSA) are used to 
decrypt the image. 

Pseudo code for MDE&DMF 
Encryption pseudo code: 

Get the image from the user 
Store that image into an Object 
Read the Object Pixel Values 

Store that Object Pixel Values into a Red, Green and 

Blue band color Text File 

Get the Pixel Values from that Text Files 

Store that Pixel Values of Text Files as three Objects 

Apply the PRA on all the Objects 

Apply the PSA on all the Objects 

Apply the PRA on all the Objects 

Prepare the Decryption Key with used algorithm 

method information 

Convert all the Pixel Values Text Files and apply 
respective color band and merge all that color band files 
into an Image File 

Store that Image File into selected storage in selected 
format 

Store that Image Decryption Key into the selected 
storage in desired format 

Decryption pseudo code: 

Get the image from the user 
Store that image into an Object 
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Get the Decryption Key to apply and decrypt the image 
If the key got authenticated Then 
Forward the process to next step 
Else 

Show an error message as key is invalid and STOP the 
process 

Read the Object Pixel Values 

Store that Object Pixel Values a Red, Green and Blue 
band color Text File 

Get the Pixel Values from that Text Files 

Store that Pixel Values of Text Files as three Objects 

Apply the PRRA on all the Objects 

Apply the PRSA on all the Objects 

Apply the PRRA on all the Objects 

Convert all the Pixel Values Text Files and apply 

respective color band and merge all that color band files 

into an Image File 

Store that Image File into the selected storage in 
selected format 

4. Implementation 

The proposed method has been implemented using the 
MATLAB simulation tool and Java 1.8 programming 
tool. The implementation is divided into two parts. The 
first part is to read and write the three band image pixel 
values into the text file and then it needs to read the three 
band image pixel values from the text file and to 
construct the image file. Five different standard images 
[19] and two non-standard images are taken for testing 
purpose. Each testing file contains 512 x 512 pixel image. 
The remaining details of the testing images are shown in 
Table 1. 


Table 1. Testing Images 


Image 
SI. No. 

Image 

Name 

Standard / Normal Image 

Image 

Size 

1 

Baboon 

Standard Image / Tiff Format 

258 KB 

2 

Cameraman 

Standard Image / Tiff Format 

256 KB 

3 

Lena 

Standard Image / Tiff Format 

260 KB 

4 

Pirate 

Non-Standard Image / Tiff 
Format 

257 KB 

5 

Room 

Non-Standard Image / Tiff 
Format 

258 KB 

6 

Peppers 

Standard Image / Tiff Format 

206 KB 

7 

House 

Standard Image / Tiff Format 

106 KB 




8 (b) LI output Image 



8 (c) L2 output Image 8 (d) L3 output Image 

Figure 8 (a - d). Step by Step Decryption Process 


The step by step working formation on image of the 
proposed encryption and decryption algorithm is applied 
on the baboon standard “TIFF” [20] image file format 
[21] and the resultant images are shown above. Figure 
7(a) is the input image, figure 7(b) is the first level output 
image, figure 7(c) is the second level output image, and 
figure 7(d) is the final level output image i.e., encrypted 
image. Similarly figure 8(a) is the encrypted image, 
figure 8(b) is the first level output image, figure 8(c) is 
the second level output image, and figure 8(d) is the final 
level output image i.e. decrypted image. There are 
sixteen different types of pixel shuffling algorithms are 
available in the proposed Pixel Shuffling algorithm. 
Among those pixel shuffling methods, for testing purpose 
all the algorithms are used in this paper. 


5. Results and Discussion 

There is no change found on the histogram of normal 
image, encrypted image and decrypted image. The 
histogram of Baboon image for normal image, encrypted 
image and decrypted image is shown below in figure 9 
(a-c). The implementation is done in two ways, they are: 

• Different Image One Type method (DIOT) 

• Single Image All Types method (SIAT) 



7 (a) Encryption Input Image 



7 (b) LI output Image 



7 (c) L2 output Image 


7 (d) L3 output Image 


Figure 7 (a - d). Step by Step Encryption Process 



9 (a) 9 (b) 9 (c) 

Figure 9 (a - c). Histogram of Normal, Encrypted and Decrypted Image 


Figure 9(a) shows the normal input image histogram [22], 
Figure 9(b) shows the encrypted image histogram [22] 
and Figure 9(c) shows the decrypted image histogram 
[22]. Table 2 shows 7 different Input images and its 
related encrypted image and decrypted image in Different 
Images One Type method (DIOT). The different images 
are processed in one of the proposed encryption 
algorithms to verify the algorithm working style. In that, 
Type-16 encryption algorithm has been used to encrypt 
the test images. In the table 2, the la - 7a images are 
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normal input image, the lb - 7b are encrypted image and 
lc - 7c are decrypted image. 


Table 2. DIOT related Normal, Encrypted and Decrypted Image 




Test Image lb 


Test Image lc 



Test Image 2a 




Test Image 2c 



Test Image 3 a 



Test Image 3b Test Image 3 c 



Test Image 4a 




Test Image 5 a 


Test Image 5b 


Test Image 5 c 




Test Image 6a 


Test Image 6b 


Test Image 6c 



Table 4 shows the parameters used to verify the 
comparison between encryption image and decryption 
image with normal image in DIOT. The Size of the 
Image, isequal() Function [23], PSNR [24] and MSE [25] 
are taken as parameters and they are compared among 
input image, encrypted image and decrypted image. 


Table 3 shows the input image and the proposed 
encryption algorithms (i.e. From Type-01 to Type-16) 
encrypted images in Single Image All Types method i.e. 
SIAT. All the encrypted images might look like the same, 
but the differences are present in the Type-01 to Type-16 
encrypted image outputs. The parameters used to 
measure the differences between the encrypted images 
are shown in table 5. The Single image is processed in 
all the proposed encryption algorithms, i.e. Type-01 to 
Type-16 to verify that all the algorithm’s encrypted 
images are different from one another or not. 


Table 3. SIAT related Normal, Encrypted and Decrypted Image 




Test Type 9 




Test Type 12 


Test Type 10 




Test Type 11 



Test Type 15 


Test Type 16 


Histogram 
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Table 4. Comparison of size of the image, isequalQ Function, PSNR and MSE for Different Images One Type (DIPT) 


Image 

Details 

Size of the 

isequal () 

PSNR Value 

MSE 

Name 

Image 

Function 

Rate 

Test 

Input Test 

Encryption Image 

258 kB 

0 

33.8481328 dB 

108.08 

Image 1 

Image Vs 

Decryption Image 

258 kB 

1 

Inf dB 

0 

Test 

Input Test 

Encryption Image 

256 kB 

0 

34.7574709 dB 

87.66 

Image 2 

Image Vs 

Decryption Image 

256 kB 

1 

Inf dB 

0 

Test 

Input Test 

Encryption Image 

260 kB 

0 

33.7661045 dB 

110.14 

Image 3 

Image Vs 

Decryption Image 

260 kB 

1 

Inf dB 

0 

Test 

Input Test 

Encryption Image 

257 kB 

0 

33.7901752 dB 

109.53 

Image 4 

Image Vs 

Decryption Image 

257 kB 

1 

Inf dB 

0 

Test 

Input Test 

Encryption Image 

258 kB 

0 

34.0635209 dB 

102.85 

Image 5 

Image Vs 

Decryption Image 

258 kB 

1 

Inf dB 

0 

Test 

Input Test 

Encryption Image 

206 kB 

0 

35.1889686 dB 

79.37 

Image 6 

Image Vs 

Decryption Image 

206 kB 

1 

Inf dB 

0 

Test 

Input Test 

Encryption Image 

106 kB 

0 

33.7953091 dB 

109.40 

Image 7 

Image Vs 

Decryption Image 

106 kB 

1 

Inf dB 

0 


Table 5. Comparison of size of the image, isequalQ Function, PSNR and MSE for Single Image All Types (SIAT) 


Algorithm 

Details 

Size of the 

isequal () 

PSNR Value 

MSE 

Type 

Image 

Function 

Rate 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7668073 dB 

108.74 

1 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7640038 dB 

108.81 

2 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7620891 dB 

108.86 

3 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7621153 dB 

108.86 

4 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7555824 dB 

109.02 

5 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7684094 dB 

108.70 

6 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7640273 dB 

108.81 

7 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7596387 dB 

108.92 

8 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7571000 dB 

108.99 

9 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7681899 dB 

108.71 

10 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7622032 dB 

108.86 

11 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7632227 dB 

108.83 

12 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7643787 dB 

108.80 

13 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7658733 dB 

108.77 

14 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7622326 dB 

108.86 

15 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 

Type 

Input Test 

Encryption Image 

234 kB 

0 

27.7600431 dB 

108.91 

16 

Image Vs 

Decryption Image 

234 kB 

1 

Inf dB 

0 
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The Size of the Image, isequal() Function, PSNR and 
MSE are taken as parameters and those things are 
compared among the input image, encrypted image and 
the decrypted image in DIOT and SIAT. The above 
mentioned parameters have been proved that the 
proposed algorithms are encrypted and decrypted the 
testing images and also the same parameters are proving 
that, there are differences presented between the sixteen 
different encryption algorithm’s encrypted images in 
SIAT. The PSNR Value and MSE Rate equations used 
for calculations are shown below. 


Zmw [/ 2 ( m,n)-I 2 ( m,n )] 2 

( 1 ) 

M*N 

^ n2 ^ 

101og io 

{ MSE J 

( 2 ) 


The Equation 1 is used to calculate the MSE Rate and the 
Equation 2 is used to calculate the PSNR Value. 



Figure 10. PSNR Differences between 16 Algorithms’ Output 
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ire 11. MSE Differences between 16 Algorithms’ Output 


Figure 10 shows the differences in PSNR value among 
the sixteen different proposed encryption algorithms. 
Figure 11 shows the differences in MSE rate among the 
sixteen different proposed encryption algorithms. The 
figure 10 and figure 11 details proven that, there are 
differences presented among the sixteen proposed 
algorithms and the encrypted images are different from 
one another algorithm’s output. 

6. Conclusion 

The users’ rights and privacy should not to be affected in 
online image encryption, and the users’ image need to be 
encrypted within the users’ country border limit. 
Moreover, the encrypted images needed to be kept back 
within users’ country border limits and finally the users 
and their service provider who need to maintain the trust 
between them are the most important highlighted 
requirements of the cloud service users. Those things are 
covered in the Secured Cloud Data Storage Prototype 
Model and in that proposed model the image’s 
confidentiality will be taken care of by Multi- 
Dimensional Encryption and Decryption Model. The 
proposed method has encrypted and decrypted the test 
images successfully in both multiple image single 
method testing and single image all propose algorithm 
methods testing. The single image all proposed algorithm 
methods testing has proved that all the 16 different 
encryption methods are not the same; each and every 
method will provide different encrypted image. The 
algorithm testing has been done on 512 x 512 pixel 
images only. In future the proposed algorithm will be 
extended to handle any size of images for encryption and 
decryption process to maintain the confidentiality of the 
image in online. 
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Abstract 

In everyday life, people always encounter different text 
images. These text images are in a style of linear or 
multi-oriented texts in either printed or written form. Due 
to different orientations of texts in an image, it is a 
challenge in Optical Character Recognition to recognize 
this kind of text. In this paper, real time recognition of 
text in different rotational variations is presented. The 
performance is done from acquisition of image by a 
camera and processed by Microsoft Visual Studio. The 
detection and recognition of text with different rotational 
variations are achieved by detecting and computing the 
direction and angle of tilt respectively through the use of 
geometric and trigonometric principles then recognized 
by Tesseract optical character recognition engine after 
counter rotation. 1 

Keywords: multi orientation angle, rotational variation, 
tilt angle, tilt direction, Optical Character Recognition. 

Nomenclature 

OCR Optical Character Recognition 

BLOB Binary Large Object 

ROI Region of Interest 

CC Character Confidence 

WC Word Confidence 

1. Introduction 

Text is a human-readable sequence of characters and the 
words they form are in either written or printed work. 
These characters are often in the form of alphanumeric 
that created series of words. Reading text is a part of our 
everyday lives. But these texts are not always in a 
horizontal manner that humans usually see and easily 
read. Different orientations of text existed due to the 
creativity of humans, and these text arranged in different 
orientations can also be certainly read by humans because 
of their perception. But detection and recognition of these 
texts in different orientations is a challenge in the field of 
machine vision. 

Numerous studies have been conducted to advance the 
recognition of text in multi-orientation. The study focuses 
on end-to-end real-time text localization and recognition 


method. They present that the real-time performance is 
achieved by posing the character detection problem as an 
efficient sequential selection from the set of Extremal 
Regions. All of the features are scale-invariant, but not 
all are rotation-invariant however, the features are 
somewhat robust against small rotations [3]. Another is a 
proposed technique to extract text from natural scene 
images but the proposed system is sometimes not able to 
detect and extract text properly because of some factors 
like the image may be tilted, some shadow area or the 
background is complex [1]. 

A recent study entitled Text-Line Detection, 
Segmentation and Recognition in Natural Scene 
presented scene text detection and extraction from 
images and an algorithm which involves pre-processing 
of images by applying wiener filter and run length 
method to detect the text in images. This algorithm does 
not only detect the text in image but it also detects the 
blur text. The problem with this study is the certain 
limitations stated that text with multi-orientation angle 
cannot be detected [4]. 

To solve this problem, a system is proposed by the 
researchers to recognize text with different rotational 
variations by detecting and computing the direction and 
angle of tilt respectively through the use of some 
geometric and trigonometric principles then 
implementing Optical Character Recognition after 
counter rotation. 

This research is essential to aid the existing studies in 
advancing image processing. The vital part is to make it 
more efficient to read text on different rotation variations 
and to present a new method for detecting tilt direction 
and angle in text characters. This can also be useful in 
further studies or development of study about tilt 
direction and tilt angle in character recognition. 

The remainder of the paper is organized as follows: 
Section (2) focuses on how the system is implemented 
and evaluated. Section (3) emphasize the results in 
identifying the direction and angle of tilt and the 
evaluation result of the system’s reliability. 

2. Theory 

The system is implemented and evaluated using a 
computer with an Intel core i3 (2GHz) microprocessor 
with 4 GB RAM running at Windows 10 Home 64-bit 
Edition, and the sensor used is A1 Tech AW-06 Webcam 
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with a resolution of 640x480 pixels (30 frames per 
second). The starting distance is 35 cm. The samples are 
printed in a 8.5”xl 1” in Calibri font style and 72 font size. 
Figure 1 shows the whole process of the system. Text 
image acquired by the camera is considered as the input 
image and the one that will be processed by the system. 



Figure 1 Conceptual Framework 

Figure 1 shows the whole process of the system. Text 
image acquired by the camera is considered as the input 
image and the one that will be processed by the system. 
After the image has been acquired, it will be subjected to 
image pre-processing. Pre-processing includes image 
binarization and canny edge detection. Image pre¬ 
processing is done to make the image free to noise and be 
converted to full binary image. After pre-processing, 
Grass Fire Algorithm will be implemented. Grass fire 
Algorithm works by burning the pixel from a certain 
point or points to another. In this process, the pixel 
burning will start from a seed point of a Region of 
Interest up until the entire region of interest is covered. 
After pixel burning, all the pixel points that lie on the 
edge most part of the burned region will be stored in its 
knowledgebase as outermost points. Then, bounding box 
will be generated, tracing the mean of values from the 
outermost points, making the system capable of drawing 
tilted bounding box. After generating the bounding box, 
the direction of tilt and its angle will be determined. 
Counter rotation of the image will be implemented next, 
considering the direction detected and the angle 
computed. Lastly, OCR engine will be used to recognize 
text characters. 

A. Identifying the Direction and Angle of Tilt 

First step of the whole process of the system is the text 
image acquisition. This process is done by a camera. 
Then, the source image will be subjected to pre 
processing that includes binarization and edge detection 
to make the image be in pure black and white and to 
remove noise. After pre-processing, grass fire algorithm 
will be implemented. In this process, the algorithm starts 
the pixel burning at a seed point and then, it will spread 
out to the entire region of interest that covers that seed 
point that is why it is called region growing because all 


the pixel that cover the seed point will be selected as part 
of a new region [4] [5]. Figure 2 illustrates pixel burning. 



Figure 2 Pixel Burning 

The information about the pixel burned are placed on a 
list or stored in a memory thus making the information 
about the outermost points be isolated. The system will 
get the mean of values from the outermost of points 
firstly generated by the algorithm. Those mean of values 
will be used to create the straight lines which will lead to 
generation of the tilting bounding box, which makes it 
the most essential part of the system. The smallest 
possible bounding box will enclose the word with all the 
mean of values of outermost points considered as seen on 
figure 3. With these tools, the system will be able to 
create bounding box for the words that are inclined. 

Figure 3 Bounding Box 

After that, significant points will be derived to be used on 
detecting and computing the direction and angle of tilt 
respectively. Since the image is composed of pixels 
supposedly lying on the Cartesian plane, and the 
bounding box has been already generated, some 
information about the bounding box can be established. 
The bounding box generated is a rectangle consists of 
two longer sides, two shorter sides and four comer points 
with x and y coordinates. Significant points will always 
be the endpoints of the longer side with lower y 
coordinate as seen on Figure 4. 





Pfl'-nc 1 

linn 

Figure 4 Significant Points 


There is a special tilt case that the system will encounter 
wherein both of the longer side of bounding box has the 
same lowest Y coordinate. In this case, no significant 
points will be established; therefore, no direction 
detection and angle computation will happen because the 
system assumes that the image is tilted at 90 degrees 
making it to be subjected immediately to rotation (90 
degrees, clockwise). After establishing the significant 
points, a decision making process seen on figure 5 will be 
used on detecting the direction of tilt by comparing its y 
coordinates. Then, a reference triangle will be drawn to 
get the angle of tilt through the use of the formula 1 
shown on figure 6. 


40 















Graphics, Vision and Image Processing Journal, ISSN 1687-398X, Volume 17, Issue 2, ICGST LLC, Delaware, USA, Dec. 2017 


START 


GET 

Y« IflflYft 







S 


; - 

- 1 Okrectionof 
*• Oitlstottw 
left 

Rotation is 

coulter 

clockwise 


|J Direction of 
. tilt is to 
right 

Rotation is 
dockwtse 




ENO 


Figure 5 Direction Detection Decision flow 


To determine the reliability of the system, each sample 
will be subjected in eight different tilt cases to see if there 
is a significant variation in recognition for each tilt case. 
Tilt cases are as follows: Case 1 on zero degrees, case 2 
on 45 degrees, case 3 on 90 degrees, case 4 on 135 
degrees, case 5 on 180 degrees, case 6 on 225 degrees, 
case 7 on 270 degrees, case 8 on 315 degrees. Correct 
recognition will tell that the rotation done is right and 
will be marked as success rotation and recognition. 
Success rate for each tilt cases will be computed as seen 
in formula 3. 


total number of successful rotations 

Success Rate l per tilt cas e] =---------*100 

total number of samples 

For the overall reliability, the researchers will get the 
average of all success rates as seen in formula 4. 


(3] 


E Success Rate (per tilt case') 
Reliability = -—-- 


( 4 ) 



Figure 6 Computation of Angle of Tilt 

0 = tan 1 — (1) 

Ax 

After detecting and computing the direction and angle of 
tilt, counter rotation will be implemented to make the 
image be back to zero degree orientation. Then, Tesseract 
Optical Character Recognition will be used together with 
its confidence function [6]. From the scale of 0-9 with 0 
being the best and 9 being the worst, Tesseract OCR 
engine make judgment on how confident it is that the 
character recognized is really t he correct character. 
Then, those values will be fed to formula 2 for the word 
confidence computation. The system implemented OCR 
twice thus computing the word confidence also twice, on 
the word’s zero degree orientation and on its 180 degree 
counterpart as seen on figure 7. 


Word Confidence = 


( 1 10 - CCl >+(10 -HX2 )+(10+CC33 +- + (10+CCtt) > 
ion 


P) 


CC - Character Confidence 
n - Number of Characters in the Word 



Figure 7 Stages of Word Confidence Reading 

After computing the word confidences, it will be used to 
decide for the output recognition. The output recognition 
will always be the word with higher confidence. 

B. Acquiring the system’s reliability on 

recognition of every rotated text characters in 
different rotational variations 


3. Results and Discussion 

A. Result of Identifying the Direction and Angle of 
Tilt 



Figure 8 Data Outputs involved in Identifying the 
Direction and Angle of Tilt (a) Source Text Image (b) 
Pre-Processed Image (c) Output After Pixel Burning 
(d) Output After Isolating Outermost Points (e) 
Bounding Box Generated (f) Direction Detected and 
Angle Computed 
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Results are gathered from the 100 samples prepared by 
the proponents. Figure 8 shows the data outputs of the 
processes involved in identifying the direction and angle 
of tilt. Figure 8 (a) shows the source image from one of 
the samples and (b) shows the output images after pre¬ 
processing. Figure 8 (c) shows the output image after 
pixel burning has been done wherein the white part 
represents the region of interest burned entirely while 
Figure 8 (d) shows the output image after the outermost 
points has been isolated from the region of interest 
represented by the irregular white line. Figure 8 (e) 
shows the output images after the bounding box has been 
generated from the outermost points represented by the 
red box and Figure 8 (f) shows the output image after the 
direction and angle of tilt has been identified which, from 
that specific sample, is 45 degrees to the right written in 
blue font color. After identifying the direction and angle 
of tilt, rotation is implemented wherein the image will be 
rotated as per the angle computed in contrast to the 
direction identified to make the image be in 0 degree 
orientation before recognition. 



(d) 

Figure 9 Data outputs After Recognition and Word 
Confidence Reading (a) WC1 greater that WC2 (b) 
Output Recognition (c) WC1 less than WC2 (d) 
Output Recognition 

Figure 9 shows the data outputs specific to word 
confidence reading process as well as the data outputs for 
recognition after comparing the word confidence 
readings. Figure 9 (a) shows the stages of word 
confidence reading when WC1 is greater than WC2. As 
seen, the reading on first stage is 0.95 while the reading 
on the second stage is 0.56 making the system decide the 
output to be on the first stage reading as seen on Figure 9 
(b). As seen on figure 9 (c) the reading of word 
confidence on the second stage is 0.96 which is higher 
than the word confidence reading on the first which is 


0.52, thus, making the output recognition took place on 
the second stage as seen on figure 9 (d). Figure 10 shows 
the data outputs of recognition one of the samples 
subjected to eight (8) different tilt cases. 



Figure 10 Data Outputs of Recognition for every Tilt 
Case (a) Case 1; 0 degree (b) Case 2; 45 degrees (c) 
Case 3; 90 degrees (d) Case 4; 135 degrees (e) Case 5; 
180 degrees (f) Case 6; 225 degrees (g) Case 7; 270 
degrees (h) Case 8; 315 degrees 

B. Acquired system's reliability on recognition of 
every rotated text characters in different rotational 
variations 


TILT 

CASES 

SUCCESS 
RATES (%) 

1 (0 Degree) 

98 

2 (45 Degrees) 

97 

3 (90 Degrees) 

95 

4 (135 Degrees) 

90 

5 (180 Degrees) 

91 

6 (225 Degrees) 

90 

7 (270 Degrees) 

94 

8 (315 Degrees) 

95 

RELIABILITY: 

93.75% 


Table 1 Results of the Study 

From 100 samples, success rate is computed in each tilt 
cases and provided the following outputs: 98% for the 
first tilt case, 97% for second tilt case, 95% for third tilt 
case, 90% for the fourth tilt case, 91% for the fifth tilt 
case, 90% for the sixth tilt case, 94% for the seventh tilt 
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case, 95% for eighth tilt case. The overall reliability of 
the system in terms of recognizing every rotated text 
characters is 93.75%. Table 1 shows the summary of the 
success rates computed per tilt case and the reliability of 
the system. As seen, cases 4, 5 and 6 has the lowest 
success rate due to its first rotated image resulting always 
to 180 degrees orientation. When optical character 
recognition is implemented, it calculates the word 
confidence on the first rotation making the higher chance 
of getting a higher word confidence than the second word 
confidence reading. An error rate of 6.25% was also 
determined. The error rate is consists of word 
misjudgment and word rearrangement errors. Figure 11 
shows the data outputs for misjudgment error. As seen on 
figure 11 (a) the sample “SOW” was recognized as 
“MOS”. This means that the recognition took place on 
the 180 degree orientation of the word because the 180 
degree orientation of the word formed another word with 
higher word confidence value than the original word that 
confuses the system. Figure 11 (b) shows the word 
confidence on two stages of rotation. As seen on the 
figure, the sample on the first rotation has an output word 
confidence of 0.9 and an output on second rotation of 
0.96 which is higher than the first reading. Because of 
that, the output of the system is the second stage of 
recognition which is the 180 degree counterpart of the 
sample. This kind of error is usually present on some 
word, and on some tilt cases depending on the 
combination of the characters inside the word. The error 
sometimes happened due to the varying rotation, and 
sometimes, due to the combination of the characters in 
the word solely. This error frequently happened on cases 
4, 5, and 6. 



Figure 11 Data Outputs for Misjudgement Error (a) 
Word Confidence Reading (b) Output Recognition 



Figure 12 Data Output for Word Rearranging Error 

Figure 12 shows the data output for word rearrangement 
error. As seen, the sample “die pick die place die attach” 
were rearranged during recognition when subjected to 
rotational variations becoming “attach place die pick die 
die”. This happens because the system is reading the 
recognized word from top to bottom, left to right, 
disregarding the arrangement of the words when 
subjected to rotation. 


4. Conclusion 

The researchers developed a system that can recognize 
text in different rotational variations by obtaining the 
right orientation of the text through acquisition of the 
direction and angle of tilt using geometric and 
trigonometric principles. Even though the reliability of 
the system is high, there are still some incidents that the 
system fails to recognize the text properly. First is when a 
word, when rotated 180 degree, will result to 
combination of new characters forming another word, 
causing the confusion of the system in choosing between 
the two words from the recognition of two rotated 
images. This happens frequently on cases 4, 5, and 6 
because of the fact that the word being process first is 
upside down. Second is caused by multiple line of words 
that when subjected to rotational variations, words were 
being rearranged, producing an output of disordered 
words. This happens because the system read the 
recognizes words from top to bottom. Regardless of the 
mentioned incidents where errors occurred, the 
proponents conclude that the system can still detect and 
recognize text in different rotational variations. 
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